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ABSTRACT 


In this thesis, we present two lossless compression approaches. Our Rotational Tree 
Approach (RTA) is based upon mathematics developed by Fredricksen. RTA uses the 
rotations associated with binary necklace classes to disperse source bit strings to a forest 
of Huffman encoding trees. Our Indexed Tree Approach (ITA) also uses a Huffinan 
forest, but disperses bit strings via a simpler mechanism based upon the first few bits of 
each string. For text compression, we find RTA to be competitive with standard 
Huffinan encoding while ITA is generally superior by a small margin of 1% - 3%. Both 
approaches owe their (limited) success to decreased modeling overhead as compared to 
standard Huffinan encoding. Compression results against the Canterbury Corpiis test suit 
and complete Java implementation code are included as appendices. 
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I. INTRODUCTION 


A search for the string ‘Huffman Encoding’ in the academic science and 
engineering database Ei Compendex Web [1] yields the following results: 

• 1970’s database - 27 papers found 

• 1980’s database - 61 papers found 

• 1990’s database - 305 papers found 

Clearly, despite the passing of nearly 50 years since D.A. Huffinan’s [2] landmark 
work on lossless compression, new uses are still being found for his ideas at ever 
increasing rates. The purpose of this paper is to present two previously untried 
approaches to lossless compression both of which involve the notion of multiple 
HufiBman encoding trees. 

■Our Rotational Tree Approach (RTA) is based upon mathematics developed by 
Fredricksen [3,4]. RTA uses the rotations associated with binary necklace classes to 
disperse source bit strings to a forest of Huffman encoding trees. Our Indexed Tree 
Approach (ITA) also uses a Huffinan forest, but disperses bit strings via a simpler 
mechanism based upon the first few bits of each string. For text compression, we find 
RTA to be very competitive with standard Huffinan encoding while ITA is generally 
superior by a margin of 1 % - 3%. 
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II. COMPRESSION FOUNDATIONS 


A. COMPRESSING/DECOMPRESSING 

In general, lossless data compression consists of reading a stream of symbols 
from file A, replacing each symbol with some (hopefully shorter) binary code, and 
writing the new codes to some file B. File B is now a compressed version of file A (or 
expanded, if the method was ineffective). The decompression process is the inverse. The 
codes stored in file B are read, converted to the original symbols, and written to some 
new file C. Upon successful completion of a lossless compression / decompression 
cycle, file C is identical to file A. 

B. LOSSY VS. LOSSLESS 

Data compression techniques can be divided into two broad categories - lossy and 
lossless. As the name implies, lossy compression techniques sacrifice some of the 
information content of the source file. Lossless compression techniques, on the other 
hand, preserve all the information content of the original file. Lossless techniques 
demand that the uncompressed file be bit for bit identical with the original source file. 

Lossy compression is often used on files containing digitized voice or image data. 
Lossy compression makes sense in these settings since digital audio and video data files 
are always truncations of their original analog sources. Since one doesn’t have all the 
data to begin with, it is often acceptable to sacrifice a bit more for the sake of 
compression (so long as the loss is not readily noticeable to the end user). Popular lossy 
compression schemes include JPEG (for images), MPEG (for video), and MP3 (for 


3 



audio). Lossy compression, while both useful and interesting, is not the subject of this 
thesis and will not be discussed further. 

Lossless methods are used to compress documents, database records, executable 
files, and any other form of data that must be reproduced exactly. The two compression 
algorithms discussed in this paper are lossless. 

C. INFORMATION THEORY 

The theory of information developed by Claude Shannon [5] in the 1940’s 
concerns the study of the storage, processing, and transmission of information. 
Information Theory provides an objective mathematical way of measuring the 
information content of a particular symbol (within a given message) and of the whole 
message. Information content is referred to as entropy, and is generally measured in bits. 
The entropy E of an input symbol i can be calculated using 

E(i) = log,(l/p(i)), 

Equation 1 

where p(i) is the probability of i occurring in the message. The entropy H of an entire 
message (typicall> a file) is then the sum of the entropies of all the individual symbols in 
the message, or 

H = I(lo&(l/p(i)) 

Equation 2 

summed over all the symbols in the message. 
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Consider the frequency distribution of characters in a simple 100-character message: 


Characters Found 

A 




G 

F 

Frequency 

45 

16 

13 

12 

9 

5 

Entropy 

1.15 

2.64 

2.94 

3.06 

3.47 

4.32 


Table 1 

The total entropy is H = (1.15 * 45) + (2.64 * 16) + (2.94 * 13) + (3.06 * 12) 

+ (3.47 * 9) + (4.32 * 5) = 221.76 bits. Shannon’s equations predict that under ideal 
circumstances we should be able to represent the above message in 222 bits. We contrast 
these statements with traditional ASCII encoding, which uses 8 bits per character or 
8 * 100 = 800 bits to represent the data in the target file. One can immediately see how 
much room for improvement via compression actually exists in a typical text file. Thus, 
Shaimon’s methods provide us with a lower bound with which to measure the 
effectiveness of any compression scheme. 

D. COMPRESSION STATISTICS 

There are many ways to express the size reduction of a file after it has been 
compressed. In this paper we define the percentage of decrease (POD) via the 
formula 

POD = {change in file size / original file size) * 100, 

Equation 3 

where the change in file size = original file size - compressed file size. 

In the event that the compression fails (bloats the file) the resulting POD will be 
negative. 
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E. 


STATISTICAL VS. DICTIONARY MODELING 


A compression model is a collection of data and rules used to map each input 
symbol to an output code. The term symbol is intentionally vague, as a symbol can 
represent an arbitrarily long string of bits deemed useful by the chosen model. A 
compression program uses its model to accurately define the probabilities for each input 
symbol. The model allows the program to produce an appropriate code for each symbol. 
Lossless compression is generally implemented using either statistical or dictionary 
modeling. 

In statistical modeling the compression program analyzes the source file and finds 
the most common ssmbols. Typically, statistical modeling uses fixed-length symbols. 
Then each input s\TnboI is assigned a replacement code. Replacement codes are 
normally short for the more common symbols and longer for the less common ones. An 
output file is then built b> representing each symbol in the source file by its replacement 
code. When all the replacements are made the output file should be shorter than the 
source file since the more common symbols all get shorter representations. Even though 
some symbols get longer representations, they occur only infrequently in the text, thus 
they have a limited effect on the output file length. 

In dictionar> modeling a single code is used to replace a string of fixed-length 
symbols. Equivalently, one could say that symbols are of variable length. Regardless of 
the interpretation, a program using a dictionary model merely builds a list of the most 
frequent words, phrases, bit strings, or input symbols found in the source file. Each entry 
in the dictionary is then assigned a fixed-length replacement code, which can be thought 
of as simply an index into the dictionary. 
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The compression program uses its dictionary much like the authors of a thesis use 
the list of references at the end of the paper. Instead of writing out the full citation for a 
referenced paper, a number is used to indicate the citation’s index within the 
bibliography. The reader of the thesis decompresses the citation by finding the indicated 
position within the list of references and mentally replaces the number with the full 
citation. Similarly, the dictionary compression program replaces input symbols with their 
assigned replacement codes (indices), and the decompression program does the inverse. 

To draw a distinction between statistical and dictionary modeling, note that in 
statistical modeling fixed-length symbols are replaced by variable-length codes, but in 
dictionary modeling variable-length symbols are replaced by fixed-length codes. 

F. STATIC VS. ADAPTIVE MODELS 

The models in the previous section were described in a ‘static’ way, i.e., each 
model was created only after the entire source file was completely analyzed. This means 
two passes through the source file are required - the first to build the model and the 
second to do the actual symbol replacement. If a program is compressed using a static 
model then the decompression program must also have access to the same model in order 
to transform the codes in the compressed file back into the symbols from the original 
source file. In many cases, this means that the model will have to be included as part of 
the compressed file. Model overhead information is normally located at the beginning of 
the compressed file in an area referred to as the file header. Header overhead is one of 
the major performance burdens of static methods. 

An alternative to static modehng is adaptive modeling (sometimes also referred to 

as dynamic modeling). In adaptive modeling the compression program updates the 
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model as the output file is being created. This means that the output codes produced by 
the compression program might have different meaning depending upon their relative 
position in the output file. 



Figure 1: General Adaptive Compression [6] 

There are several advantages to adaptive modeling. First, only one pass is 
required to encode the source file. Second, adaptive techniques quickly tune themselves 
to the file they are encoding and are thus better able to compress files containing 
markedly different types of data. Third, modeling information need not be explicitly 
included in the output file header as it is with static techniques. Instead, the adaptive 
decompression program simply begins with the same initial model as the compression 
program did, and then updates its model each time it reads a code from the compressed 
file. Thus, although the decompression program’s model is in a constant state of change, 
it is always identical to the compression program’s model at the same point in the 
process. Obviously, adaptive techniques rely on the compression and decompression 
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programs both beginning with the same model, and both updating the model in exactly 
the same fashion. The following diagram depicts the decompression process. 


Codes 


Read Input 
Code 


Decode 
^ Symbol 




Update 

Model 


► 


Output 

Symbol 


Symbols 







< 


Figure 2: General Adaptive Decompression [6] 

G. SURVEY OF LOSSLESS COMPRESSION TECHNIQUES 

There are literally hundreds of serious data compression programs available on 
the web. The interested reader is referred to the excellent compression resource, “The 
Data Compression Library” [7]. We do not survey these compression implementations. 
Instead, we present the foundational techniques from which the current generation of 
compression programs has sprung. In this way we hope to paint a clear picture of the 
field of compression, and to show where our methods fit into the field. 

A word of warning for those who do decide to explore the world of compression 
via the Internet: The vast majority of commercial applications are enhancements and 
hybrids of the more basic techniques, which follow in this section. These applications are 
typically tuned for specific data or specific performance (speed, memory footprint, or file 
size). Few limit themselves to any one compression technique. Instead, multiple 
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techniques are used in an effort to squeeze the maximum possible compression out of 
source files. Naturally, this leads to increased algorithm and program complexity. In 
short, the Internet may not be the best place to learn the fundamentals of compression. 
Two excellent books on the subject are “The Data Compression Book” [6] and 
“Compression Algorithms for Real Programmers” [8]. 

Most of the following techniques can use either a static or adaptive (dynamic) 
model. Interestingly, most statistical approaches seem to have been originally conceived 
in a static manner and later explored using adaptive methods. On the other hand, 
dictionary techniques typically began as adaptive methods, and only later began to be 
investigated statically. The techniques that follow are presented using the method with 
which they were originally developed. 

1. Run Length 

Long runs of repeated characters are one of the simplest forms of redundancy 
found in a file [9,10]. Consider the string of repeated characters, 
AAAAAAAABBBCCCCCCCDDDDEEEEEEEEE. 

A run length encoding scheme might represent the string as 
8A3B7C4D9E. 

Unfortunately, typical text does not exhibit these types of patterns. There are 
some applications of this technique - the runs of black and white pixels on a fax image 
for example, but in general it is not particularly suited as a stand-alone method. Typical 
text compression using run length encoding is less then 5%. Of course, greater 
compression results can be achieved with specific data (e.g., fax images). 
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2 . 


Shannon-Fano 


This technique uses a static statistical model to build a binary tree in a top-down 
fashion. Each leaf of the tree represents an input symbol, while the path from the root 
node to a leaf gives the leaf s replacement code. The replacement codes generated by the 
algorithm share the following properties: 

• Codes are of variable length. 

• High probability codes are shorter then low probability codes. 

• No code is a prefix of any other code from the same tree. 

The Shannon-Fano algorithm [11] proceeds through the following steps: 

1. Build a frequency distribution for the input symbols. 

2. Sort the list of symbols in descending order by frequency. 

3. Initialize the replacement code for each symbol to be null. 

4. Divide the list in two parts with the total frequency count of the upper half 

being as close as possible to the total frequency count of the lower half. 

5. Append a 0 to the replacement code of each symbol in the upper half and a 1 to 

the replacement code of each symbol in the lower half. 

6. Recursively apply steps 4 and 5 to each partial list until each list contains 
exactly one symbol. 

Although an excellent technique, Shaimon-Fano encoding has generally been 
replaced by Huffman encoding, which is marginally superior. 

3. Huffman 

Standard Huffman encoding [2] uses a static statistical model to build a binary 
tree in a bottom-up fashion. The codes generated by the tree share all the properties of 
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those of those generated by Shannon-Fano and have the added advantage of being 
provably optimal. Optimal in this case means that there isn’t a better set of integral- 
length replacement codes than those generated by the algorithm. 

The Huffman algorithm proceeds as follows: 

1. Build a frequency distribution for the input symbols. 

2. Create a collection of single-node trees C each associated with a symbol and 
its frequency. 


3. Let ti and t 2 be two trees with the lowest frequency in Collection C. Replace ti 
and t 2 with a single tree ts, formed by attaching ti and t 2 as the left and right descendants 
of the new node. The frequency associated with ta is the sum of the frequencies of ti and 
t2- 


4. While more than one tree remains in C, repeat step 3. 


Though optimal, Huffman encoding does not quite reach Shannon’s predicted 
entropy limit. Consider the breakdown of our imaginary 100-character message and its 
associated Huffman tree shown below: 


Symbol 

Frequency 

Entropy 

(bits) 

Total Entropy 
(bits) 

Huffman 
Code Length 
(bits) 

Total Huffinan 
Code Length 
’ (bits) 

A 

45 

1.15 

51.75 

1 

45 

E 

16 

2.64 

42.24 

3 

48 

H 

13 

2.94 

38.22 

3 

39 

R 

12 

3.06 

36.72 

3 

36 

G 

9 

3.47 

31.23 

4 

36 

F 

5. 

4.32 


4 

20 


100 


221.76 

j 

224 

Table 2: Frequency Table for 100-< 

Character Message 
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Figures: Huffman Tree 

We see that while the total information content of the message is 221.76 bits, 
Huffman requires 224 bits to actually encode it. While this is not bad, it is not truly 
optimal. The deviation from the predicted value is due to the inability of the method to 
accurately represent the fractional entropy of each symbol. Thus, while there is no better 
integral length encoding scheme, there is still room to improve on Huffman’s algorithm. 
Note that Shannon-Fano encoding will also require 224 bits for this file. 
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4 . 


Arithmetic 


Arithmetic coding [12, 13, 14] can use either a static or an adaptive statistical 
model. It completely discards Huffman’s idea of replacing an input symbol with a 
specific code, and instead replaces the entire string of the input symbols in a message 
with a single floating point number between 0 and 1. For this reason it does not suffer 
the discrete code-length limitations of Huffman encoding. Arithmetic encoding uses a 
probability distribution to assign a proportionate range between 0 and 1 to each input 
symbol. It is typically able to approach the Shannon theoretical minimum - sometimes 
yielding improvements of up to 10% over standard Huffman encoding. The major 
drawback of the technique is that it is computationally intensive and therefore slow. 

Arithmetic Encoding Algorithm: 

1. Build a probability table for all the input symbols. 

2. Establish proportionate lower bounds (Li) and upper bounds (Ui) between 0 

and 1 for each symbol based upon its probability of occurrence. 

3. low = 0.0, high = 1.0, range =1.0 

4. Read in the next symbol to encode Si. 

5. range = high - low. 

6. high = low + range * Ui. 

7. low = low + range * Li. 

8. While more symbols remain to encode, repeat step 4. 

9. Output the floating point number between low and high that has the shortest 

binary representation. 
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Consider a hypothetical message written in a 3-symbol alphabet D, A, X with the 
following probability distribution and bounds. Notice that each character is assigned the 
portion of the 0 to 1 range that corresponds to its probability of appearance. 


Symbol Index 

(i) 

Symbol 

(Si) 

Probability 

(Pi) 

Lower Bound 
(Li) 

Upper Bound 
(Ui) 

1 

D 

0.20 

0 

0.20 

2 

A 



0.38 

3 

X 

0.62 

0.38 

1 


Table 3: Probability Table for Arithmetic Encoding 

To see how the method works, we encode the message “DXXA” one symbol at a 
time. The first s>-mbol “D” has a range from 0 to 0.20 so we know that the final encoded 
value for the message must fall within this range. Each time we add a new symbol we 
further restrict the range of our final value that represents the entire message. Thus, the 
next symbol “X” restricts us to the last 62% of the current range, or 0.124 to 0.2. The 
next symbol is another “X” so we again restrict the output to the last 62% of the current 
range, or 0.17112 to 0 2. The final symbol “A” limits us to between 20% and 38% of the 
current range, or 0.17()S‘>6 to 0.1820944. Now we pick the value in the final range with 
the shortest binary representation. This is accomplished with some relatively 
straightforward binary anthmetic. In this case the shortest value turns out to be 
0.1796875 = 0.0010111 base 2. The message is encoded as a unique floating-point value 
between 0 and 1. We require 7 bits to compres? 4 symbols (about the same performance 
as Huffinan on this particular example). 

Consider the resulting partition of the number line shown below to see how any 
string in our alphabet could be similarly, yet uniquely, encoded. Notice that there is 

exactly one path to any given “leaf’ of our fuzzy ternary free. 
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0 


1 


toA.V pA.\^DA\ pA.\ DAX pAX pA\ DAX pA\ pA\ D-AX pAX pAV DAV pA\ pAX p V\ pAX pAX b,VX DAX pAX pAX pAX pAX DAX DAX 


Table 4: Partitioning of the Number Line by Arithmetic Encoding: 
A Fuzzy n-ary Tree (intervals not to scale) 


5. Ziv and Lempel 

Jacob Ziv and Abraham Lempel are essentially the fathers of adaptive dictionary 
compression. Their first algorithm, commonly referred to as LZ77 [14], uses previously- 
seen text as a dictionary. The algorithm replaces variable-length phrases (symbols) from 
its input stream with fixed-length indices (codes) into its dictionary. What makes this an 
adaptive method is that its dictionary is literally a sliding window consisting (in a typical 
implementation) of the last 4k bytes of data from the input stream. Thus, while new 
groups of symbols are being read in to a look-ahead buffer, the algorithm is searching for 
matches between the look-ahead buffer and all strings located in the previous 4k bytes of 
data (the dictionary). If a match is found then an ordered triple (a, b, c) is sent to the 
output file. The triple consists of: 

• An index a into the previous 4k of data. 

• The length b of the match. 

• The symbol c immediately following the match in the look-ahead buffer. 
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If no match is found then a and b are 0 and c is the first symbol in the look-ahead 
buffer. The text window and look-ahead buffer then slide forward sufficiently to shift the 
matching symbol(s) out of the look-ahead buffer and into the dictionary. The process 
then repeats. 

The amount of compression achieved by this method depends upon the dictionary 
entries being sufficiently longer then the ordered triples they are replaced with. Here is a 
simplified example of the process: 



Input Stream 

^^T..upon a time there was a wise man who stumbled 

upon a timeless secret. 

Sliding Dictionary 


Look-Ahead Buffer 


Figure 4: LZ77 Algorithm 


Suppose our sliding dictionary is 64 bytes long and the look-ahead buffer is 16 
bytes long. Further suppose that the first occurrence of the word “upon” begins at 
position 35 in the current sliding dictionary. Since a match has been found, our 
compression program will now send the ordered triple “35”, “H”, “1” to the output file. 
Since the dictionary is 64 bytes long 6 bits are required to represent each of the index and 
length of the match, plus 8 bits for the ASCII character “1”. Thus, we have just encoded 
an 11 * 8 = 88 bit phrase in 6 + 6 + 8 = 20 bits. 

There are two problems with LZ77. First, since the dictionary is constantly 
sliding, the algorithm sometimes actually throws away valuable old strings for worthless 
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new ones. This is a potential drawback to adaptive methods in general. Second, the 
length of pattern matches is limited to the length of the look-ahead buffer. Increasing the 
size of the look-ahead buffer, however, has a nasty side effect - it proportionately 
increases the number of string comparisons that must be performed against the 
dictionary. This creates a significant performance problem. Thus, LZ77 is not able to 
store long strings to its dictionary even when it would be highly profitable to do so. 

A second adaptive dictionary technique developed by Ziv and Lempel, referred to 
as LZ78 [16], uses a different approach to building its dictionary. Instead of maintaining 
a sliding window into the preceding text, LZ78 incrementally builds its dictionary from 
strings found in all of the preceding text. The algorithm does this by growing dictionary 
entries one character at a time. For example, the first time the string “hello” is 
encountered it is not put in the dictionary. Instead, “h” is added to the dictionary. The 
second time “hello" is encountered it is still not put in the dictionary. Instead, “he” is 
added. The third occurrence puts “hel” into the dictionary, and so on until the fifth 
occurrence of the stnng "hello” actually causes “hello” to become part of the dictionary. 
This incremental procedure of slowly adding a string to the dictionary does a good job of 
eventually getting all of the frequently used strings into the dictionary. Once the 
dictionary begins to accurateh' model the source file, large savings can be achieved as the 
algorithm replaces long frequently occurring strings with short fixed-length indices. 

LZ78 does not suffer from the same limitations as LZ77. It is able to handle long 

strings and once it adds a string to its dictionary it remains there until the entire file is 

processed. LZ78 has its own difficulties, however. In LZ77 the dictionary is easier to 

manage. It is a fixed block of already processed data. In LZ78 the dictionary starts with 
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nothing but the null string and grows. It must be managed with a data structure, 
(typically an n-ary tree that looks remarkably like the fuzzy tree shown earlier for 
arithmetic encoding) and limited to some finite size (typically around 2'^ entries). 
Despite these problems LZ78, like LZ77, was a groundbreaking compression approach. 
Terry Welch later enhanced LZ78 - the resulting algorithm is commonly referred to as 
LZW. 
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in. MATHEMATICAL FOUNDATIONS 


A. INTRODUCTION 

In this chapter we will discuss the necklace algorithm and the properties of the 
necklace algorithm that we find to be useful for compression. First we explore properties 
of binary n-tuple necklaces. Next, we consider equivalence classes of necklaces and how 
to enumerate the equivalence classes for a given n using the Burnside formula [17]. Then 
we use the Theta Algorithm [3] to generate all of the equivalent necklaces for a given n. 
Next, we consider the space reduction that can be realized from the sub-cyclic properties 
of necklaces. Finally, we describe a close relative to the necklace classes, the Lyndon 
Words. 

B. NECKLACE CLASSES 

The binary n-tuples are strings of O’s and I’s of a fixed length n. The circular 
rotation of a binary n-tuple produces a cyclic equivalence class of binary n-tuples called a 
necklace class. We represent each necklace class by the munerically largest n-tuple in the 
class. This gives us Z[n] necklaces representing the 2" strings from 000...0 to 111...1, 
where Z[n] is defined in equation 4. The necklace algorithm provides an efficient way to 
list these necklace classes. The necklace algorithm typically produces the necklaces in the 
order from largest to smallest. There is no difference in which way we represent these 
strings and an equivalent version of the necklace algorithm could invert the order of the 
necklace classes. 
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]. Equivalence Classes and the Burnside Formula 


A necklace is an equivalence class of strings of bits of length n that can be rotated 
to obtain equivalent aspects of the same necklace. Thus, two necklaces are said to be 
inequivalent if, no matter how rotated, one cannot be transformed into the other. Z[n] 
gives the number of inequivalent necklaces of length n, where Z[n] is determined by 
using an instance of the Burnside enumeration formula: 

Z[n] = (l/n)Z((t)(d)*2^'’^‘*\ for d = 1 to n and d is a divisor of n. 

Equation 4 

Here (t)(d) is Euler’s Totient Function, defined as the number of positive integers less than 
d and relatively prime to d. 

The formula in equation 4 enumerates the necklace classes for a given n, and the 
necklace algorithm tells us what the necklaces are. To illustrate the use of the formula 
we consider an example: 

Let n = 6, then 

Z[n] = 1/6(1 *2^ + 1*2^ + 2*2^ + 2*2) = 1/6 (64+8+8+4) = 84/6 = 14. 

d=l d=2 d=3 d=6 

When d = 1, only 1 is relatively prime to 1. When d = 2, again only 1 is relatively 
prime to 2. When d = 3, 1 and 2 are relatively prime to 3. When d = 6, both 1 and 5 are 
relatively prime to 6. 

We identify two upper bounds for the number of necklace classes of length n. A 
simple upper bound is (2^"^* A tighter upper bound is When n = 6, we 
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know that there are 14 necklace classes. If we compute the first upper bound we get 
21.33 while the second upper bound yields 16. We use the tighter upper bound in the 
implementation of our compression algorithms. 

2. Necklace Algorithm 

The necklace algorithm is fairly straightforward. First we need to define the 0 
step. This operation 0: V" V" is defined as follows: 0(ai,a2,..., an) = 

(bi, b 2 ,..., bn) where bi = ai for i =1,2,..., j-1 and j is defined as the largest subscript such 
that aj > 0 and ak = 0 for all k > j. Then 

bj = aj.i,bj + , = bt fort= 1,2, ...,n-j. 

Necklace Algorithm (for necklaces of length n) [3] 

0. The initial necklace is 11... 1 = 1 

1. To find the i+1®' necklace apply 0 to the i'*’ necklace; 

2. The resulting string is the next necklace if and only if j|n. 

3. If j does not divide n, then apply 0^, 0^, ..., 0'^ to the i*’’ necklace until the 
smallest k is found so that j|n. The resulting string is the next necklace. 

4. If the necklace found is not the last necklace, namely, 0", return to step 1. 

Note: If step 2 is modified such that j = n, we produce a list of all Lyndon words 
of length n (Lyndon words are discussed in more detail in paragraph B). 

To illustrate and to clarify the algorithm, start with the largest n-long necklace (an 

n-long string of 1 ’s), subtract one from the last 1 in the string and copy the string ai ... aj- 

1 imtil an n-long string is formed. Now do the “j divides n check”. If j|n then the string 
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formed is a necklace, if not, subtract one again from the last non-zero in the current string 
and copy to form an n-long string. 

As an example consider the necklaces produced when n = 6: 


mill 

This is the first necklace (ai, z. 2 , , ae)- 

111110 

j = 6 

Since 6 divides 6, this is a necklace. 

111101 

j = 5 

Since 5 does not divide 6, this is not 

111100 

j = 6 

Since 6 divides 6, this is a necklace. 

111011 

j = 4 

Since 4 does not divide 6, this is not 

111010 

j = 6 

Since 6 divides 6, this is a necklace. 

111001 

j = 5 

Since 5 does not divide 6, this is not 

111000 

j = 6 

Since 6 divides 6, this is a necklace. 

110110 

j = 3 

Since 3 divides 6, this is a necklace. 

110101 

j = 5 

Since 5 does not divide 6, this is not 

110100 

j = 6 

Since 6 divides 6, this is a necklace. 

noon 

j = 4 

Since 4 does not divide 6, this is not 

110010 

j = 6 

Since 6 divides 6, this is a necklace. 

110001 

j = 5 

Since 5 does not divide 6, this is not 

110000 

j = 6 

Since 6 divides 6, this is a necklace. 

101010 

j = 2 

Since 2 divides 6, this is a necklace. 

101001 

j = 5 

Since 5 does not divide 6, this is not 

101000 

j = 6 

Since 6 divides 6, this is a necklace. 

100100 

j = 3 

Since 3 divides 6, this is a necklace. 

100010 

j = 4 

Since 4 does not divide 6, this is not 

100001 

j = 5 

Since 5 does not divide 6, this is not 

100000 

j = 6 

Since 6 divides 6, this is a necklace. 

000000 

j-1 

Since 1 divides 6, this is a necklace. 


a necklace. 

a necklace. 

a necklace. 

a necklace. 

a necklace. 

a necklace. 

a necklace. 

a necklace, 
a necklace. 


There are fourteen necklaces generated by the Necklace Algorithm; that is the 


number given by the equation 4. Additionally, it is easy to see that a string that ends with 
a 1 (except all 1 ’s, of course) can never be a necklace, since that 1 could be rotated to the 
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front of the n-tuple and the rotated n-tuple, being larger, will have already appeared on a 
previous necklace in the lexicographic listing of the necklaces. For each value of j, the 
string ai, a 2 , aj.i is a Lyndon word of length j. (See paragraph C for a discussion of 
Lyndon words.) 

3. Necklace Algorithm Pseudo Code 


DESCRIPTION: Finds all the necklace classes of length n. Note the indexing 
scheme used by the bit arrays of this method. The most significant bit is 
considered to be at index 1 while the least significant bit is at index n. 

PRECONDITION: The array classes are large enough to hold all the necklace 
classes generated by the algorithm. 

POSTCONDITION: classes contains all the necklace classes of length n sorted in 
descending numerical order. 

void getNecklaceClasses(/* IN */ int n, 

/* OUT */ BitArray[] classes) 

{ 


BitArray c = 2^ - 1; 
classes [0] = c; 
intk= 1; 
while (c > 0) 

{ 

int j = c.leastSiglO; 

c[j++] 0 

inti= 1; 
while (j <= n) 
cD++] = c[i-H-]; 

if Gin) 

classes[k-H-] = c; 

} 


// the first class is all I’s 
// store it at index 0 of classes 
// index to store next class at 


// j gets index of least sig 1 in c 
// replace least sig 1 in c with 0 

// copy over, copy over,... 

// j-check, if TRUE it’s a class 


Figure 5: The Necklace Algorithm Pseudo Code 
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4. 


Sub-cyclic Properties 


As mentioned previously, the necklace algorithm generates all of the necklace 
classes for a given n. These necklaces are listed lexicographically from the largest (1") to 
the smallest (0"). This is contrary to normal order, but the results are equivalent to an 
ordering from smallest to largest and it is easy to translate between the necklaces listed 
from smallest to largest. 

One of the most interesting properties of the necklace algorithm is that the sub- 
cyclic rotations also occur for each divisor of a given n. That is, all the m-long necklaces 
appear where m is a factor of n. The factors of n tell us the length of the sub-cyclic or 
repeating strings for the given n. Using the Mobius function [17], in place of the totient 
function, one may enumerate the sub-cyclic rotations for the given factors of n. In fact, 
regardless of the value of n, the number of sub-cyclic rotations of length m will always be 
the same for the factors m of n. In other words, since 1 is a factor of every n, there are 
two sub-cyclic rotations of length 1, namely 0 and 1 and the necklace (10) appears in 
every necklace class when n is even. Thus, when mjn, each m-long necklace will appear 
in the necklace algorithm for the parameter n. 

In the example given where n = 6 and has the factors 1, 2, 3 and 6, the necklaces 
that appear are all the necklaces for m = 1, 2, 3 and 6. If we modify the enumeration 
formula in equation 4 by substituting for 0(d) the flmction |J.(d), where p.(d) (the Mobius 
function) is defined by 
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^(n) = (-1)'^ when n has k distinct prime factors; 

p(n) = 0 when n has a square or higher order factor, we obtain the 
expression on the right hand side of equation 5. 

We obtain the number of necklaces of length n by applying this modified 
enumeration formula on each of these factors m of n. 

N(m) = (l/m)S(p(d)*2^"^‘‘^. 

Equation 5 

So, when n is 1. there are 2 necklaces; when n is 2, there is 1 necklace. We 
determine the number of necklaces when n is 3 to be 1/3(1 *2^ -1*2) = l/3(8-2) = 6/3 = 2. 
These are the necklaces (110 and 100). Finally, we determine the number of necklaces 
whennis6as 1 6( 1 *2'-1*2'-1*2^+1*2) = 1/6(64-8-4+2) = 54/6 — 9. Therefore, 9 of the 
14 necklace classes u ill be of length 6 as can be determined fi'om our listing above of the 
necklace algorithm Thus the total number of necklaces, including all sub-cyclic 
necklaces, for n = 6 is (2* 1 -2-“9) = 14. 

These sub-c\clic rotations are important in our version of the data compression 
algorithms, because, it is cheaper to represent a string with fewer rotations, than one with 
greater rotations. Namely, it costs only 1 bit ,to represent a necklace with sub-cyclic 
length 2, where it costs 3 bits to represent all of the possible rotations of a necklace of 
length 6. 
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c. 


LYNDON WORDS [18] 


Although the two lossless compression techniques discussed in this paper do not 
use Lyndon words, we cover them because of their relation to necklaces and because we 
recommend them for future research. 

A k-3xy necklace is an equivalence class of k-zry strings under rotation. We take 
the lexicographically smallest such string as the representative of each equivalence class 
and use this in the output of the program. A Lyndon word is an aperiodic necklace 
representative. 

The illustration below shows the 6 binary necklaces with 4 beads and the 
corresponding equivalence classes of strings. The three Lyndon words are 0001, 0011, 
and 0111. 


0 0 0 1 


0 011 

0010 


0110 

0100 


1100 

1000 


1001 


0101 

1010 


0111 

1011 

1101 

1110 


1111 


Figure 6: Lyndon words of length four. 


Equation 6 gives the explicit formula for the number of Lyndon words of length n 
over a k-zry alphabet. 


NK(n) = (l/n)i:(p(d/n)*k‘’) 


Equation 6 
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IV. ROTATIONAL TREE APPROACH 


A. INTRODUCTION 

In this chapter we exhibit the bulk of our research efforts. Our efforts to find a 
compression application involving the binary necklace classes discussed in section III, 
led us to The Rotational Tree Approach (RTA). Our motivation for exploring the 
approach arises from the following observation. Let A be the set of all bit strings of 
length n. Further, define B to be the set of all necklace classes of length n. Consider the 
onto mapping firom the domain A to the codomain B, where each n-tuple maps to its 
representative necklace class in B: 



Figure 7: Mapping of Bit Strings to Their Necklace Class 

This mapping is attractive fi-om a compression standpoint because there is a 
significant space reduction between the number of bit strings and the number of necklace 
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classes to which they map since there are 2" bit strings of length n and less than 2^" ' V n 


necklace classes. 


n 

Bit Strings 

Classes 

Decrease 

6 

2^ = 64 

14 

78% 

12 

2‘^ = 4,096 

352 

91% 

16 

2*^ = 65,536 

4,116 

94% 

20 

2^°= 1,048,576 

52,488 

95% 

24 

2^''=16,777,216 

699,252 

96% 

28 

2-^=268,435,456 

9,587,580 

96% 

32 

2^- = 4.294,967,296 

134,219,796 

97% 


Table 5: Space Reduction Afforded by Mapping Strings to Necklace Classes 

B. EXPLANATION 

In order to take ad\ antage of the resulting space reduction, our idea is to map each 
source symbol in A ti^ its corresponding necklace class in B, and then represent that class 
with an index. We define the index i of an n-bit class as the position of the class in the 
ordered list of all n-bit classes as given by the Necklace Algorithm. Thus, by mapping a 
symbol to an index class w e can effectively decrease the number of distinct symbols as 
well as the size of each such symbol. This sounds promising. 

The difficulty is that there are many n-bit strings that map to the same index class. 
Thus, in order to reverse the mapping, which is necessary for decompression, additional 
information is needed; specifically, we need to know the number of rotations r originally 
used to transform the bit string to its necklace class. 
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What we have at this point is a vague model involving the quantities r (rotation) 
and i (index). What we need is a clever way to use them. As r and i are fixed in length 
(with respect to a given n) static Huffinan encoding seems a good choice. 

Our initial efforts revolved around a two-Huffinan-tree approach. We proceed as 
follows: We first read n-bit symbols from a source file. We then break each symbol into 
an index i and a rotation r. We build one Huffinan tree using all the r’s and another using 
all the i's. Finally, we write the Huffman codes for each r, i pair to the output file, in the 
order in which they were first discovered. 

Surprisingly (for us at least) this approach is a complete failure. It typically 
achieves very low levels of compression (often even bloating the source file). After some 
analysis involving comparisons against a standard single Huffinan tree built with n-bit 
symbols, we draw the following conclusions concerning this failed two-tree approach: 

• The i-Huffinan tree is much narrower and shallower than a standard Huffinan 
tree built with n-bit symbols. This means the codes produced by the i- 
Huffinan tree are shorter. This is exactly the type of decrease in symbol 
number and size that we are looking for. 

• The amount of statistical data (output file header) required for the two-tree 
approach is significantly less than that required for standard Huffinan (more 
on this later). 

• The added expense of the r-codes paired with every i-code completely erases 
the gains cited above. 
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Essentially, the r-codes are an added expense that the model cannot afford. What 
we need is a way to make the r-codes either smaller or else somehow more useful to the 
model. Our solution consisted of developing an approach involving multiple Huffman 
trees. It proceeds as follows; We first read n-bit symbols from a source file. Then we 
break each symbol into a rotation r and an i. We put the r’s into one Huffman tree, and 
distribute the i's to a forest of Huffman trees based upon their associated r’s. Finally, we 
write the Huffman codes for each r, i pair to the output file, in the same order in which 
they were discovered. 

This approach puts the r-codes to work by using them to distribute the i-codes to 
not one, but many Huffman trees. This is advantageous, as each tree in the forest will be 
both narrower and shallower than the original i-tree, and as such will produce shorter 
codes. The next section refines this procedure. 

C. ALGORITHM 

1. Build a static statistical model of the data as follows: 

a. Construct n + 1 empty frequency tables (indexed from 0 to n) where n 
represents the length in bits of the symbols read from the input stream. 

b. Read the next n-bit symbol S from the input stream. 

c. Determine the number r of rotations required to transform S into its 
necklace class representative c. 

d. Determine the index i of c in an ordered list of all n-bit necklace classes. 

e. If i does not exist in frequency table r, put i into frequency table r with a 
frequency count of 1. Else, increment the frequency count of i in 
frequency table r. 
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f. If r does not exist in frequency table n, put r into frequency table n with a 
frequency count of 1. Else, increment the frequency count of r in 
frequency table n. 

g. While there are more symbols in the input stream, repeat step a. 

2. Create a Huffinan tree from each of the n + 1 frequency tables. The leaves of 
Huffman tree n represent rotations. The leaves of all other Huffinan trees 
represent class indices. 

3. Reset the input pointer to the beginning of the input stream. 

4. Read the next n-bit symbol S from the input stream. 

5. Determine r and i for S. 

6. Write the Huffinan code for r in Huffman tree n to the output stream. 

7. Write the Huffinan code for i in Huffman tree r to the output stream. 

8. While there are more symbols in the input stream, repeat step 4. 

D. EXAMPLE 


Consider a hypothetical 54-character message and its resultant breakdown: 
Message: acgCs7bnacgCs7bnacgCs7bnggCCCCssss77777bnbnbnbnbnbnbnn 


Symbol 

Binary 

Representation 

Necklace Class 

r 

i 

Frequency 

a 

01100001 

11000010 

1 

26 

3 

c 

01100011 

11011000 

6 

18 

3 

g . 

01100111 

11101100 

5 

10 

5 

c 

01000011 

11010000 

6 

21 

7 

S 

01110011 

11100110 

1 


7 

7 

00110111 

11100110 

5 

13 

8 

b 

01100010 

11000100 

1 


10 

n 

01101110 

11100110 

4 

13 

11 






54 


Table 6: Example Message Breakdown by RTA (n = 8) 
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Step 1 of the RTA algorithm constructs the following frequency tables: 


i 

Frequency 

26 

3 

13 

7 

25 

10 


u 


O 

3 

21 

7 


Freq Table 1 Freq Table 6 


i 

Frequency 

13 

11 


Freq Table 4 


r 

Frequency 

4 

1 

5 

2 

Q 

2 

_L 

3 


Freq Table n = 8 


i 

Frequency 

10 

5 

13 

8 


Freq Table 5 


Figure 8: RTA Step 1: Build Frequency Tables 


34 














Step 2 of the RTA algorithm constructs a Huffinan tree for each frequency table. 



Figure 9; RTA Step 2: Build Huffinan Trees 
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With the Huffman trees constructed, steps 3 thru 8 of the RTA algorithm begin 
the process of creating the output codes. This is accomplished by generating the r, i pairs 
to replace each input symbol in the message. Thus, the stream of input symbols 
acgCs7bna... 

is transformed into the stream of r, i pairs: 

I, 26 6,18 5,10 6,21 1,13 5,13 1,25 4,13 1,26... 

which then becomes the output stream: 

II, 10 10,0 01,0 10,1 11,11 01,1 11,0 00,0 11,10... 

by replacing each r with its Huffman code from tree n = 8 and each i with its Huffinan 
code from tree r. For example, the r, i pair that corresponds to symbol “a”, (found on 
table 6 above) is 1, 26. The code that represents r from the n=8 tree (figure 9 above) is 
11. The code that represents the i component from Huffman tree 1 (figure 8 above) is 10. 
Therefore the input symbol “a” becomes the output 1110. Now move to the next symbol 
in the stream and repeat the process. 

F. INTERPRETATION OF RESULTS 

We successfully implemented RTA in the Java programming language (see 
Appendix A) and collected empirical data (see Appendix B). The overall compression 
performance of RTA is roughly equivalent to that of standard Huffinan encoding on both 
text and uncompressed bitmapped image data. Computationally, RTA is more expensive 
then Huffman encoding, but not prohibitively so. However, our implementation of RTA 
is space-hungry - requiring up to 5*2" + (4*2^" ' V bytes just to build the tables 
needed to efficiently calculate r and i for all the input symbols. This calculation does not 
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include the memory needed to build the actual Huffman trees, which is data dependent 
but can also be very significant. The compression results for RTA are in fact cut off at 
n = 24 due to memory constraints - a typical 0.5 MB file requiring 86 MB of table 
building memory alone! 

The compression performance of RTA is most easily explained by once again 
considering the failed two-tree approach discussed in part B above. By comparison with 
standard Huffman encoding, the two-tree approach has a favorable header size, and a 
favorable i-tree width and depth. Its fatal flaw is the overhead of the r-codes paired with 
every i-code. RTA is strapped by the same r-code overhead, but unlike the two-tree 
method, RTA uses its r-codes advantageously. By building a forest of i-trees instead of a 
single i-tree, RTA is able to generate much shorter i-codes. This allows the RTA forest 
to approach the performance of a single optimal Huffman tree - even with the added 
overhead of the r-codes. A useful way to view the Huffinan forest created by RTA is as a 
single distributed pseudo-Huffman tree with the r-tree as the root, and the i-trees as 
leaves. 
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Figure 10; Distributed Pseudo-Huffman Tree Created by RTA 


It is easy to see that this particular distributed tree is not a true Huffman tree and 
is therefore not optimal. Note that node i=10 from i-tree 5 is higher in the distributed tree 
than node i=13 from i-tree 1, even though node i=13 has a greater frequency. 

After studying the tree breakdovms of our empirical data (see Appendix D for an 
example) we conclude that the weight of the distributed tree created by RTA is typically 
within 5% of the weight of an optimal Huffman tree. While it is theoretically possible for 
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the distributed tree to be a true Huffinan tree, it would be an extremely rare occurrence 
when compressing real data. 

Using this insight into the distributed tree of RTA, we conclude that the sum of 
the lengths of the i-codes and r-codes produced by RTA will approach the sum of the 
codes produced by a standard Huffman tree built from the same data. The problem is, if 
RTA can only approach Huffman encoding, then how is it that the empirical data shows 
RTA marginally outperforming Huffman encoding on almost half of the files in the test 
suite? The answer lies in the compression header. RTA requires less overhead than 
standard Huffman encoding to pass the statistical model from the compression program 
to the decompression program. 

Model overhead is one of the drawbacks of static methods like Huffinan 
encoding, the two-trec approach, and RTA. However, without a statistical model the 
decompression program has no way of uncompressing the codes in the compressed files 
back into the svTnbois of the source file, so inclusion of the compression model is 
necessary. Our implementation of Huffrnan, two-tree, and RTA all pass their static 
models to the decompression program in the same way. Each includes a frequency 
ordered list of all the lca\ cs of their Huffman tree(s) in their respective header. The 
header includes other information as well (see Appendix E for details), but none so 
voluminous as the leaf list. 

The symbols we shall encode in our Huffinan code are binary n-tuples. A first 
pass through the data determines their probability of occurrence in the text. Armed with 
these probabilities a Huffinan code can be made for the file under consideration and the 
leaves in the Huffinan tree generated are the n-tuples appearing in the file. Then, in the 
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worst case, the leaf list for Huffman encoding will contain 2" entries of n bits each. This 
means the leaf list for a standard Huffrnan tree can require up to n * (2") / 8 bytes of 
space. 

In order to gain insight into the amount of space consumed by the leaf lists of 
RTA, we again turn to our failed two-tree approach. We noted earlier that two-tree had a 
significantly smaller header than standard Huffman encoding. This is a consequence of 
the space reduction afforded by using necklace class indices as leaves of the i-tree. Even 
though two-tree must put two leaf lists in its header, one for the r-tree and another for the 
i-tree, we see that the two-tree method still requires much less overhead than standard 
Huffman. The length and number of the leaves in the leaf lists of the two-tree method are 
governed by the properties of binary necklace classes discussed in chapter 3. The 
relevant information is summarized in the table 17. 



Leaf Length 
(bits) 

Upper Bound on 
Number of Leaves 

r-tree 

r log2n1 

n 

i-tree 

n + 1 - r log 2 n 1 

+ 1 - flog nl) 

^ 2 


Table?: Leaf List Properties 

An upper bound U on the maximum leaf list size (in bytes) for the two-tree 
approach is found by summing the product of the leaf length and the number of leaves for 
each leaf list then dividing by 8, or 

U = (T logzn! * n + (n + 1 - r logznl) ♦ 2 ^/ 8. 

The following table points out the drastic differences possible between the leaf 
lists of standard Huffinan and those of the two-tree approach. 
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n 

4 

8 

12 

16 

20 

24 

Standard 

Huffman 

8 

256 

4,096 

65,536 

1,048,576 

16,777,216 

Two-Tree 

Approach 

4 

51 

582 

13,320 

131,085 

2,621,455 


Table 8: Worst Case Leaf List Comparisons 

When run on typical data, the actual differences between the leaf lists of the two 
approaches is never as dramatic as the worst-case scenario shown above. If it had been, 
two-tree would likely have been a viable approach. 

We now draw comparative conclusions between the leaf lists of RTA and those of 
two-tree. The r-trees created by the two approaches are identical. Therefore, the r-tree 
leaf lists are identical. The difference between the two approaches is simply the number 
of i-trees created. Two-tree creates one long i-tree leaf list. RTA creates multiple shorter 
i-tree leaf lists. Since both approaches process the same r, i pairs, it is tempting to 
conclude that the concatenation of the smaller leaf lists of RTA would result in the longer 
leaf list of two-trees. If this were true, the overhead of the two approaches would be 
almost identical. An examination of the distributed tree depicted in table 16 above, 
however, reveals the flaw in this supposition. There are overlaps between the leaves of 
several of the i-trees. Specifically, i-trees 1, 4, 5 all have an i = 13 leaf This 
demonstrates that in RTA some of the leaves may occur more than once in the header. 
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Thus, the header of RTA will be larger than that of two-tree by the number of 
overlapping leaves. 

The actual number of overlapping leaves in RTA is data dependent. If the 
number of overlapping leaves is low, we expect a header size similar to that of two-trees 
and well below that of Huffman. If the number of overlapping leaves in RTA is high, it 
is possible that the header could approach or even exceed that of standard Huffman. The 
actual header sizes found for the test suite files using RTA and standard Huffman 
encoding are shown below. 


File 

Selected n 


8 

12 

16 

20 

Kennedy.xls 

253 

1552 

2899 

26869 

plabm12 txt 

98 

988 

1891 

12346 

icetIO txt 

100 

1443 

2931 

16068 

asyouilk txt 

86 

1056 

1817 

10164 

alice29 txt 

91 

1107 

1965 

10074 

.grammer Isp 

93 

493 

652 

1501 

icp htm! 

104 

1109 

2063 

- 6351 

(fields c 

106 

821 

1150 

3104 

jxargs 1 

91 

601 

802 

1926 

|sum 

250 

2073 

4058 

10274 

ipttS 

179 

961 

3958 

11429 

Henna 

255 

4499 

58125 

358290 


Table 9: Header Size for RTA 
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File 

Selected n 


8 

12 

16 

20 

kennedy.xls 

276 

1883 

3342 

33171 


98 

1188 

2203 


icet10.txt 

99 

1786 

3456 



83 

1283 

2111 


alice29.txt 

90 

1348 

2289 



91 

576 

728 



100 

1357 

2409 


fields.c 

106 

984 

1311 


xargs.1 

87 

711 

904 

2260 

sum 

273 

2608 

4805 

12557 

ptt5 

177 

1145 

4670 

13971 

lenna 

278 

5768 

71159 

447245 


Table 10: Header Size for Standard Huffman Encoding 


Tables 9 and 10 demonstrate that the RTA header is in fact typically smaller than 
the header of standard Huffman encoding at n = 8. Further, it is apparent that as n 
increases the gap in the header size difference increases in favor of RTA over Huffinan. 
This fits well with the empirical data (see Appendix B) that shows RTA performing best 
at large values of n’s. Thus, we have probably identified the key element responsible for 
the (limited) success of RTA — decreased header size. This is no surprise as headers are 
the likely place to economize over the provably optimal encoding 
G. SUMMARY 

We now summarize our conclusions fi’om the previous section regarding RTA vs. 
standard Huffman encoding. 

1. RTA compresses typical text data to within ±0.5% of Huffman encoding. 

2. RTA is computationally more expensive than Huffman encoding, though 
not prohibitively so. 
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3. RTA is space (memory) inefficient to the point of being infeasible. 

4. The non-header portion of a file compressed by RTA typically comes 
within 5% of the non-header portion of the same file compressed by 
standard Huffman encoding. 

5. RTA has a more compact header than standard Huffman encoding. The 
difference between the headers increases as n increases. Thus, RTA 
performs best on large files with a high n. 
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V. INDEXED TREE APPROACH 

A. INTRODUCTION 

In this chapter, we attempt to improve upon RTA by causing the distributed tree 
to more closely approach an optimal Huffman tree. We also seek to further decrease the 
size of the header and to reduce the memory requirements of our earlier approach.' 

One way to improve the distributed tree created by RTA is to decrease the depth 
and breadth of the i-trees. It is tempting to conclude that this can be accomplished simply 
by increasing the total number of i-trees. Unfortunately, increasing the number of i-trees 
requires an increase in the length of each r-code, which could prove detrimental to the 
overall compression ratio. Thus, we consider a more conservative approach to improving 
the distributed tree of RTA. 



1 2 3 4 5 6 7 

Index Tree 


Chart 1: Dispersion of ASCII Characters by RTA (n = 8) 
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Table 21 summarizes the mapping between ASCII keyboard characters and the 
index trees of RTA (see Appendix B for details). The graph provides some indication of 
how good a job RTA is doing in dispersing i values to its forest of i-trees. It seems 
reasonable that a more even distribution between index trees would benefit the distributed 
tree of RTA. In fact, analysis of the actual index trees created by RTA when run on the 
test suit data shows that as n increases many trees are often left empty. This suggests that 
we can enhance the distributed tree of RTA by somehow doing a better job of evenly 
dispersing i values to the i-tree forest. 

B. EXPLANATION 

Our Indexed Tree Approach (ITA) completely discards necklace classes and 
rotations as a mechanism to create a Huffman forest. Instead, as we read an input symbol 
S we use the first few bits of S (iBits) as a straightforward index into our Huffman forest. 
The remaining bits of S GBits) are then used as a leaf value for the tree indexed by iBits. 
We make the following clarifying remarks: 

• Sis the concatenation of iBits with jBits. 

• |iBits| + IjBitsI = n . 

• The integer representation of the unsigned value iBits is denoted by [iBits]. 
ITA proceeds as follows: We first read n-bit symbols from a source. We then divide 

each symbol into its respective iBits and jBits components. We put the iBits into one 
Huffman tree, and distribute the jBits into a forest of Huffinan trees based upon the value 
of their associated iBits components. Finally, we vmte the Huffman codes for each iBits, 
jBits pair to the output file in the order they were originally discovered. 
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Clearly, ITA is a derivative of RTA. ITA creates a similar Huffman forest, and 
constructs similar ordered pairs of Huffinan codes. It is, however, much more flexible 
than RTA. In RTA, a given value n determines the length of the r- and i- components of 
the algorithm. In ITA, the sum of |iBits| and ljBits| determines the value of n. Thus, there 
is only one way for RTA to compress a given file at a given n, while there are many ways 
for ITA to compress the same file using the same n. For example, if n = 12, ITA can be 
executed with the following (liBitsj, ljBits|) pairs: (1,11), (2,10), (3, 9), (4, 8), (5,7), 

(6,6), (7, 5), (8,4), (9,3), (10, 2) and (11,1). 

C. ALGORITHM 

1. Build a static statistical model of the data as follows: 

a. Construct n + 1 empty frequency tables (indexed from 0 to n) where n>l 
represents the length in bits of the symbols to read from the input stream. 

b. Read the next n-bit symbol S from the input stream. 

c. Divide S into its iBits and jBits components. 

d. If iBits does not exist in frequency table n, put iBits into frequency table n 
with frequency 1. Else, increment the frequency count of iBits in table n. 

e. If jBits does not exist in frequency table [iBits], put JBits into frequency 

f 

table [iBits] with frequency 1. Else, increment the frequency coimt of 
JBits in table [iBits]. 

f. While more symbols remain in the input stream, repeat step a. 
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2. Create a Huffman tree from each of the n + 1 frequency tables. The leaves of 
Huffman tree n represent iBits. The leaves of all other Huffman trees 
represent jBits. 

3. Reset the input pointer to the beginning of the input stream. 

4. Read the next n-bit symbol S from the input stream. 

5. Divide S into its iBits and jBits components. 

6. Write the Huffman code for iBits in Huffman tree n to the output stream. 

7. Write the Huffman code for JBits in Huffman tree [iBits] to the output stream. 

8. While more symbols remain in the input stream, repeat step 4. 

D. INTERPRETATION OF RESULTS 

We successfully implemented ITA in Java (see Appendix A) and collected 
empirical data (see Appendix B). In general, the compression performance of ITA is 
typically superior to that of standard Huffman encoding (and RTA) by a margin of one to 
three percent on text data. This difference is due to further decreases in the compressed 
file header size. ITA’s performance is roughly equivalent to that of Huffhaan encoding 
on uncompressed bitmapped images. ITA is computationally more expensive than 
Huffinan encoding, though like RTA, the difference is not apparent to the user. ITA does 
not suffer from the memory problems of RTA. In fact, it generally has a smaller memory 
footprint than does Huffman encoding. 

Inspection of table 22 below (extracted from our empirical results, see appendix 
B) reveals that the performance gains made by ITA over Huffman and RTA are primarily 
attributable to further decreases in compressed file header size. 
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Optimal 

Huffman 

RTA 

ITA 

Huffman 

RTA Non- 

ITA Non- 

File 

n 

(liBitsI, 

IjBitsI) 

Header 

Header 

Header 

Non-Header 

Header 

Header 

kennedv.xls 

8 

(4.4) 

276 

253 

191 

462532 

528207 

516085 


12 

(5,7) 

1883 

1552 

1324 

495537 

502365 

505657 


16 

(7.9) 

3342 

2899 

2476 

410583 

422265 

421657 


20 

(9.11) 

33171 

26869 

20596 

422785 

426101 

428105 

plabml2.txt 

8 

(4.4) 

98 

98 

88 

275585 

288973 

283622 


12 

(4.8) 

1188 

988 

910 

306646 

308099 

306952 


16 

(5 11) 

2203 

1891 

1630 

238776 

241024 

239739 


20 

(9.11) 

15159 

12346 

9652 

255407 

255701 

256850 

icetl0.txt 

8 

(4.4) 

99 

100 

85 

250565 

260360 

255465 


12 

(4.8) 

1786 

1443 

1324 

280698 

282929 

281309 


16 

(5.11) 

3456 

2931 

2499 

218529 

220143 

218889 


20 

(9.11) 

19763 

16068 

12453 

231209 

231767 

232403 

asvouIik.txt 

8 

(4.4) 

83 

86 

76 

75806 

78363 

77650 


12 

(5.7) 

1283 

1056 

907 

83237 

83700 

83528 


16 

(7.9) 

2111 

1817 

1469 

64531 

64969 

64891 


20 

(9.11) 

12444 

10164 

8151 

67550 

67644 

67834 

alice29.txt 

8 

(4.4) 

90 

91 

82 

87688 

91563 

90150 


12 

(4.8) 

1348 

1107 

1020 

98274 

98839 

98409 


16 

(5.11) 

2289 

1965 

1681 

76120 

77014 

76522 


20 

(9.11) 

12337 

10074 

8076 

80171 

80324 

80626 

grammer.lsp 

8 

(4,4) 

91 

93 

83 

2170 

2227 

2205 


12 

(5,7) 

576 

493 

466 

2289 

2319 

2284 


16 

(7.9) 

728 

652 

630 

1703 

1715 

1718 


20 

(6,14) 

1746 

1501 

1410 

1645 

1656 

1650 

cp.html 

8 

(4,4) 

100 

104 

89 

16199, 

16389 

16575 


12 

(6.6) 

1357 

1109 

927 

17268 

17422 

17432 


16 

(7,9) 

2409 

2063 

1652 

13337 

13389 

13419 


20 

(8.12) 

7715 

6351 

5390 

12909 

12974 

12967 

fields.c 

8 

(2,6) 

106 

106 

90 

7026 

7104 

7027 


12 

(5.7) 

984 

821 

701 

7411 

8391 

8290 


16 

(6,10) 

1311 

1150 

972 

5529 

5538 

5552 


20 

(7.13) 

3703 

3104 

2808 

5408 

5436 

5429 

xargs, 1 

8 

(2.6) 

87 

91 

81 

2602 

2662 

2645 


12 

(5.7) 

711 

601 

528 

2792 

2809 

2802 


16 

(7.9) 

904 

802 

697 

2113 

2127 

2122 


20 

(8,12) 

2260 

1926 

1780 

2000 

2006 

2009 

sum 

8 

(2.6) 

273 

250 

229 

25645 

26706 

26238 


12 

(5.7) 

2608 

2073 

1735 

24733 

25221 

24984 


16 

(7.9) 

4805 

4058 

3213 

19977 

20242 

20167 


20 

(8,12) 

12557 

10274 

8681 

19355 

19503 

19440 

Ptt5 

8 

(3.5) 

177 

*179 

160 

106551 

158044 

158279 


12 

(4.8) 

1145 

961 

866 

86993 

119953 

119939 


16 

(6,10) 

4670 

3958 

3227 

76523 

100685 

100568 


20 

(9.11) 

13971 

11429 

8911 

70124 

88956 

88886 


Table 12: Header and Non-header Compression Results (in bytes) 
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In order to explain the decrease in header size from RTA to ITA we draw 
attention to the “optimal (|iBits|, [jBits|)” values displayed in column 3 of table 12. In 
most cases |iBits| and [jBitsI are tightly centered on n/2. The reason for this goes back to 
the leaf lists discussed in chapter 4. The length and number of leaves in the leaf lists of a 
single iBits and jBits tree are given below. 



Leaf Length 
(bits) 

Upper Bound on 
Number of Leafs 

iBits tree 

liBits 

2|lBltS| 

JBits tree 

[jBits 

~2lj"Bits| 


Table 13: Leaf Lists in a Single iBits and jBits Tree 


An upper bound on the max leaf list size in b)des for a single iBits and JBits tree is 
found by summing the product of the leaf length and the number of leaves for each leaf 
list then dividing by 8. Table 14, below, shows how the upper bound for the size of the 
combined leaf lists of a single iBits and JBits tree change as i increases and J decreases. 


n = 10 

(|iBits|, [jBitsj) 


(1,9) 

576 

(2,8) 

257 

(3,7) 

115 

(4, 6) 

56 

(5,5) 

40 

(6,4) 

56 

(7, 3) 

115 

(8,2) 

257 

(9, 1) 

576 


Table 14: Sum of Leaf List Upper Bounds for a single iBits and jBits Tree 

Table 14 illustrates how the (|iBits|, ljBits|) = (n/2, n/2) is a minimum of it’s 
f(|iBits|, [jBitsI) = (I Bits I * + | jBits | * 2iiBi«i) / 8. Thus (jiBitsI, ljBits|) = (n/2, n/2) 
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produces a minimal size header for a hypothetical two-tree IT A. Since the actual ITA 
uses many jBits trees we would expect some repeated leaves in the header (just as in 
RTA). Thus, while it is possible for the ITA header to exceed the shown values, the table 
suggests optimal |iBits[, [jBitsI pairs that fit well with the empirical data. 

Naturally, (|iBits|, [jBits|) = (n/2, n/2) need not always be the exact optimal value. 
Indeed, this decomposition is only optimal in about 15% of our test cases. The most 
common optimal (|iBits|, IjBitsl) pairs for n = 8, 12, 16, and 20 are (4, 4), (5, 7), (7, 9), 
and (9, 11) respectively. Thus, there are other factors at work besides simply header 
overhead. The most critical factor in overall compression results for ITA is the degree of 
similarity betw'een IT.A’s distributed tree and an optimal Huffman tree constructed with 
the same data. If the distributed tree performs poorly, any gains made by reduced header 
overhead are quickK lost 

As mentioned earlier. ITA is more memory efficient than standard Huffinan 
encoding. The reasons for this are similar to those behind the smaller header - it is 
simply cheaper to build man>' small Huffinan trees than it is to build a single large one. 

Since our performance gains over ITA are due to smaller header size, we 
conclude that ITA is no better at dispersing JBits than RTA was at dispersing i-values. 
Thus, although we are able to enhance the overall compression performance of RTA, we 
are unable to improve its distributed tree. 
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VI. CONCLUSIONS 


A. SUMMARY OF MAIN RESULTS 

We have presented two lossless compression techniques. Both techniques use a 
static statistical data model. Our first approach (RTA) involves binary necklace classes 
and multiple Huffinan trees. 

1. RTA compresses typical text data to within ±0.5% of standard Huffinan 
encoding. 

2. RTA is computationally more expensive than Huffman encoding, though not 
prohibitively so. 

3. RTA is space- (memory-) inefficient to the point of being infeasible. 

4. The non-header portion of a file compressed by RTA typically comes within 
5% of the non-header portion of the same file compressed by standard 
Huffman encoding. 

5. RTA has a more compact header than standard Huffinan encoding. The 
difference between the headers increases as n increases. Thus, RTA performs 
best on large files with a high n. 
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Our second lossless compression approach (ITA) uses the same basic model as 
RTA, but abandons binary necklaces in lieu of a simpler and more efficient tree indexing 
mechanism. 

6. ITA compresses typical text data one to three percent better than standard 
Huffman encoding. Compression of uncompressed bitmapped images is 
roughly equivalent to that of Huffman encoding. 

7. ITA is computationally more expensive than Huffman encoding, though not 
prohibitively so. 

8. ITA requires less memory than standard Huffman encoding. 

9. The non-header portion of a file compressed by ITA typically comes within 
5% of the non-header portion of the same file compressed by standard 
Huffman encoding. 

10. ITA has a more compact header than standard Huffman encoding. The 
difference between the headers tends toward a maximum at values centered on 
n/2 and becomes more pronounced as n increases. 

B. FUTURE RESEARCH 

This paper explores the use of binary necklace classes for lossless compression 
using a static statistical model. Our conclusions indicate that it is cheaper to pass header 
data for many small trees rather than a single large tree. Our approaches use one level of 
indirection (a single level of indirection equates to one thing pointing at many things) - a 
single tree pointing at a single forest. An interesting extension might be to have two or 
more levels of indirection. With two levels of indirection a single tree would point at a 
forest of trees, and each tree in the forest would then point at another forest. This 
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approach would greatly reduce the leaf length of each tree. The drawback is that the total 
number of trees would increase exponentially - thus greatly increasing the chance of 
overlapping leaves, something we found to be a drawback. 

A mathematical topic closely related to necklace classes is that of Lyndon words. 
Lyndon words offer an opportunity to explore lossless compression using a dictionary- 
based model. Lyndon words exhibit some interesting properties, which may be 
applicable to data compression. Chapter 3 includes a short discussion on Lyndon words. 
Appendix A contains some preliminary java code, which generates an interesting subset 
of the Lyndon words. 
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APPENDIX A: JAVA CODE 


package thesis.compression.multi_tree; 


/ * * 

* Allows C++ style assertions to be inserted in Java code for testing 
and debugging. 

*/ 

public class Assertion { 

public static boolean assertOn = true; 

private AssertionO {}; // no public constructor 

public static void assert(boolean validFlag) 
throws 

AssertionException 

{ 

if (assertOn && ivalidFlag) { 

throw new AssertionException(); 

} 

} 

public static void assert(boolean validFlag, String msg) 
throws 

As s e rtionException 

{ 

if (assertOn && IvalidFlag) { 

throw new AssertionException(msg); 

} 

} 

} 


package thesis.compression.multi_tree; 

public class AssertionException extends RuntimeException { 
public AssertionException0 { 

super("AssertionException"); 

} 

public AssertionException(String msg) { 
super(msg); 

} 

} 


package thesis.compression.multi_tree; 

/** 

* This class is an abstraction for a string of bits. Bitstring 
objects have length and 
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* bitPattern attributes. This implementation limits the length of a 
Bitstring to 32 bits. 

* Bitstring objects are mutable. This means that both the length and 
bitPattern attributes 

* of an instance can be changed by setter methods after the 
instantiation of the object. 

*/ 

public class Bitstring implements Comparable { 

public static final short MAXLEN; 
public static final int[] MASK; 
public static final int[] MASK_R; 
public static final int[] MASK_L; 
static { 

MAXLEN = 32; 

MASK = new int[MAXLEN]; 

MASK_R = new int[MAXLEN]; 

MASK_L = new int[MAXLEN]; 

for (int i = 0; i < MAXLEN; i++) { 

MASK[i] = 1 << i; 

MASK_R[i] = -1 >>> 31 - i; 

MASK_L[i] = -1 << i; 

} 

} 

private short length; 
private int bitPattern; 

j -k -k 

* Returns one more than the number of bits to the right of the 
most significant 

* bit in bitPattern. Alternatively, this may be thought of as 
the minimum length 

* required to capture all the significant bits of bitPattern. 

*/ 

public static short length(int bitPattern) { 
for (int i = 0; i < 32; i++) { 

if (bitPattern < 0) { 

return (short)(32 - i); 

} 

bitPattern <<= 1; 

} 

return 0; 

} 

y ★ * 

* Operates like System.arraycopy except on BitStrings instead of 
arrays. 

* 

* ©throws IndexOutOfBoundsException if a copy operation would 
exceed the bounds 

* of either the source or destination Bitstring object. 

public static void bitCopy(Bitstring src, 

int srcPos, 

Bitstring dst, 
int dstPos, 
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int numBits) throws 

IndexOutOfBoundsException { 

int srcEnd = srcPos + numBits - 1; 
int dstEnd = dstPos + numBits - 1; 

if (srcPos < 0 1 I dstPos < 0 | [ numBits < 0 || 

srcEnd >= src.length 1| dstEnd >= dst.length) { 
throw new IndexOutOfBoundsException(); 

} 

// isolate the source bit field 
int s = src.bitPattern; 
int d = dst.bitPattern; 

int m = MASK_L[srcPos] & MASK_R[srcEnd]; 
s &= m; 

// allign the mask and source bit field with the destination 
bit field. 

if (srcPos < dstPos) { 

int Ishift = dstPos - srcPos; 
s <<= Ishift; 
m <<= Ishift; 

} else { 

int rshift = srcPos - dstPos; 
s >>>= rshift; 
m >>>= rshift; 

} 

// clear the dest bit field then AND in the source bit field 
dst .bitPattern = s [ (d & (--m) ) ; 

} 

/** 

* Returns the Bitstring object represented by str. 

* 

* ©throws NumberFormatException just as Integer.parseint(str, 2) 
would. 

*/ 

public static Bitstring parseBitString(String str) throws 
NumberFormatException { 

return new Bitstring(str.length 0, Integer.parseint(str, 2)); 

} 

/* 

* Creates a Bitstring [] to represent intArr. All BitStrings of 
the returned array 

* will be long enough to contain the greatest value in the int[]. 
*/ 

public static Bitstring [] toBitStringArr(int [] intArr) { 
if (intArr == null) return null; 

// find greatest int value in the array 
int greatest = intArr[0]; 

for (int i = 1; i < intArr.length; i++) { 

if (intArr[i] > greatest) { 
greatest = intArr[i]; 

59 



} 

} 

int bitsNeeded = (int)Bitstring.length(greatest); 

// create the Bitstring [] 

BitString[] retValue = new Bitstring[intArr.length]; 
for (int i = 0/ i < retValue.length; i++) { 

retValue[i] = new Bitstring(bitsNeeded, intArr[i]); 

} 


return retValue; 

} 

/** Constructs a default Bitstring object of length = 0, 
bitPattern = 0 */ 

public Bitstring 0 { 
length = 0; 
bitPattern = 0; 

} 

/*★ 

* Constructs a Bitstring object to represent bp 

★ 

* ©throws NumberFormatException exactly as Integer.parseint(bp, 

2 ) 

*/ 

public Bitstring(String bp) throws NumberFormatException { 
bitPattern = Integer.parseint(bp, 2); 
length = (short)bp.length(); 

} 

/** 

* Constructs a Bitstring object of length len with the bitPattern 
attribute 

* represented by bp. 

* 

* ©throws NumberFormatException exactly as Integer.parseint(bp, 

2 ) 

*/ 

public Bitstring(short len. String bp) throws NumberFormatException 

{ 

relnitden, Integer .parseint (bp, 2)); 

} 

/** Constructs a Bitstring object of length len and bitPattern bp. 

*/ 

public Bitstring(short len, int bp) { 
relnitden, bp); 

} 

/** Constructs a Bitstring object of length len and bitPattern 0. 

public Bitstring(short len) { 
relnitden, 0); 

} 
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/** Constructs a Bitstring object of minimum length needed to 
properly represent bp. */ 

public Bitstring(int bp) { 
bitPattern = bp; 
length = length(bp); 

} 

/** Constructs a Bitstring object of mimimum length needed to 
properly represent bp */ 

public Bitstring(Bitstring bp) { 
relnit(bp); 

} 

/** allows package classes to create BitStrings without bounds 
checking */ 

Bitstring(int len, int bp) { 
bitPattern = bp; 
length = (short)len; 

} 

/** Returns true if the bit at bit position index is a 1. */ 

public final boolean bitAt(int index) throws 
IndexOutOfBoundsException { 
boundsCheck(index); 

return (MASK [index] & bitPattern) !- 0; 

} 

/** Sets the bit at bit position index to 1. */ * 

public final void setBit(int index) throws 
IndexOutOfBoundsException { 
boundsCheck(index) ; 
bitPattern |= MASK[index]; 

} 

/** Clears the bit at bit position index */ 
public final void clearBit(int index) throws 
IndexOutOfBoundsException { 
boundsCheck(index); 
bitPattern &= -MASK[index]; 

} 

/** Sets the bit at bit position index if value is true, else 
clears the bit. */ 

public final void assignBit(int index, boolean value) throws 
IndexOutOfBoundsException { 
boundsCheck(index); 
if (value == true) { 

bitPattern |= MASK[index]; 

} else { 

bitPattern &= -MASK[index]; 

} 

} 

/** Returns the length of this Bitstring. */ 
public final short length() { 

return length; 

} 
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J -k* 

* Sets the length of this Bitstring to len. This may result in 
truncation if 

* len < this.length. 

* 

* ® LengthOutOfBoundsException if len < 0 or len > 

Bitstring.MAXLEN. 

*/ 

public final void setLength(short len) throws 
LengthOutOfBoundsException { 

relnitden, bitPattern) ; 

} 

/** Returns this BitStrings bitPattern attribute. */ 
public final int bitPattern() { 
return bitPattern/ 

} 

y ★ * 

* Sets this BitStrings bitPattern attribute to bp. Note, if 
this.length is not 

* great enough bp may be truncated. 

*/ 

public final void setBitPattern(int bp) { 
relnit(length, bp); 

} 

/** 

* Sets this BitStrings bitPattern to represent that of bp. Note, 
if this.length is 

* not great enough bp may be truncated. 

* 

* ©throws NumberFormatException exactly as Integer.parseint(bp, 

2 ) . 

*/ 

public final void setBitPattern(String bp) throws 
NumberFormatException { 

relnit(length. Integer.parseint(bp, 2)); 

} 

/** allows package classes to relnit BitStrings without bounds 
checking */ 

final void relnit(int len, int bp) { 
length = (short)len; 
bitPattern = bp; 

} 

/** 

* Reinitializes this Bitstring to length len and bitPattern bp 

* 

* ©throws NumberFormatException exactly as Integer.parseint(bp, 

2 ) 

*/ 

public final void relnit(short len. String bp) throws 
NumberFormatException { 

relnit (len. Integer.parseint(bp, 2)); 
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} 

/** 

* Reinitializes this Bitstring to length len and bitPattern bp 

★ 

* ©throws LengthOutOfBoundsException if len < 0 OR len > MAXLEN. 
*/ 

public final void relnit(short len, int bp) throws 
LengthOutOfBoundsException { 

if (len < 0 I I len > MAXLEN) { 

throw new LengthOutOfBoundsException(); 

} else if (len == 0) { 
length = 0; 
bitPattern =0; 

} else { 

length = len; 

bitPattern = bp & MASK_R[length - 1]/ 

} 

} 

/** Reinitializes this Bitstring to bp */ 
public final void relnit(Bitstring bp) { 
length = bp.length; 
bitPattern = bp.bitPattern; 

} 

/** AND*s this Bitstring with bp. Only those bits that overlap 
between this Bitstring 

* and bp are affected 
*/ 

public final void and(Bitstring bp) { 

bitPattern &= (MASK__L [bp.length] | bp.bitPattern) ; 

}. 

/** OR's this Bitstring with bp. Only those bits that overlap 
between this Bitstring 

* and bp are affected 
*/ 

public final void or(Bitstring bp) { 
if (length == 0) return; 

bitPattern |= (bp.bitPattern & MASK_R[length - 1]); 

} 

j -k-k 

* Logically shifts the bits of this Bitstring one position to the 
left. If wrapOn 

* is TRUE then the most significant bit will be shifted into the 
least significant 

* bit position. 

*/ 

public final int iShift(boolean wrapOn) { 
if (length ==0) { 
return 0; 

} 

int highBit = (MASK[length - 1] & bitPattern) != 0 ? 1 : 0; 
bitPattern <<= 1; 
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bitPattern &= MASKER[length - 1]; 

if (wrapOn) { 

bitPattern |= highBit; 

} 

return highBit; 

} 

/** 

* Logically shifts the bits of this Bitstring numBits % 
this.length to the left. 

* If wrapOn is TRUE then bits are shifted in from the right as 
they are shifted 

* out from the left. 

*/ 

public final void lShift(int numBits, boolean wrapOn) { 
if (numBits < 0) { 

rShift(numBits * -1, wrapOn); 

) else if (numBits * bitPattern == 0) { 

/ 

} else if (!wrapOn) { 

// IShift with true has bug 
bitPattern <<= numBits; 
bitPattern &= MASK_R[length - 1] ; 

} else { 

int nBits = numBits % length; 

int m = bitPattern & (MASK_R [length -* 1] & MASK_L [length - 

nBits]) ; 

bitPattern &= -m; 
bitPattern <<= nBits; 

bitPattern |= (m >>> (length - nBits)); 

} 

} 

/** 

* Logically shifts the bits of this Bitstring one position to the 
right. If wrapOn 

* is TRUE then the least significant bit will be shifted into the 
most significant 

* bit position. 

*/ 

public final int rShift(boolean wrapOn) { 
int lowBit = bitPattern & 1; 
bitPattern >>>= 1; 

if (wrapOn && lowBit == 1) { 

bitPattern |= MASK[length ~ 1]; 

} 

return lowBit; 

} 

/** 

* Logically shifts the bits of this BitString numBits % 
this.length to the right. 
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* If wrapOn is TRUE then bits are shifted in from the left as 
they are shifted 

* out from the right. 

*/ 

public final void rShift(int numBits, boolean wrapOn) { 
if (numBits < 0) { 

IShift(numBits * -1, wrapOn); 

} else if (numBits * bitPattern == 0) { 

/ 

} else if (IwrapOn) { 

bitPattern >>>= numBits; 

} else { 

int nBits = numBits % length; 

int m = bitPattern & MASK_R[nBits - 1]; 

bitPattern >>>= nBits; 

bitPattern 1= (m << (length - nBits)); 

} 

} 

/** Returns the position of the least significant 1 in this 
Bitstring */ 

public final int leastSiglO { 
int bp *: bitPattern; 
int i =« 0 ; 

while ((bp & 1) == 0) { 

bp > > > « 1; 

i ♦ ♦ ; 

} 

return i < MAXLEN ? i : -1; 


/★* 

* Concater.anec rVal to this Bitstring. 

* 

* ©throws LengthOutOfBoundsException if length + rVal.length > 
Bitstring. MAXLE?;. 

*/ 

public final vDid concat(Bitstring rVal) throws 
LengthOutOfBoundsException { 

short conratLen « (short)(length + rVal.length); 
if (concatl^en > MAXLEN) { 

throw new LengthOutOfBoundsException ( 

"Concatenated result would exceed ” + MAXLEN); 

} 

bitPattern <<* rVal.length; 
bitPattern j« rVal.bitPattern; 
length = concatLen; 


public int hashCodeO { 
return bitPattern; 

} 

public string toStringO { 

if (length -= 0) return "[0]”; 
char[] out = new char [length] ; 
for (int i = 0; i < length; i++) { 
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out[length - 1 - i] = bitAt{i) ? : 'O'; 

} 

return String.valueOf(out); 

} 

public final int compareTo{Bitstring bs) { 
if (length != bs.length) { 

// the longer Bitstring is always considered larger 
return length - bs.length; 

} else if ((bitPattern | bs.bitPattern) < 0) { 

// two MAXLEN BitStrings, one or both have their high bits 

set 

if (bitPattern < 0 && bs.bitPattern >= 0) { 

// this Bitstring negative, rhs non-negative 
return 1; 

} else if (bitPattern >= 0 && bs.bitPattern < 0) { 

// this Bitstring non-negative, rhs negative 
return -1; 

} else { 

// both negative 

return (bitPattern & Integer.MAX_VALUE) - 

(bs .bitPattern & Integer.MAX__VALlJE) ; 

} 

} else { 

// two equal BitStrings w/o their high bits set 
return bitPattern - bs.bitPattern; 

} 

} 

public final int compareTo(Object obj) { 

return compareTo((Bitstring)obj); 

} 

public final boolean equals(Object obj) { 

if ((obj != null) && (obj instanceof Bitstring)) { 

Bitstring bs = (Bitstring)obj; 

return (length == bs.length) && (bitPattern == 
bs.bitPattern); 

} 

return false; 

} 

private final void boundsCheck(int index) throws 
IndexOutOfBoundsException { 

if (index < 0 || index >= length) { 

throw new IndexOutOfBoundsException(); 

} 

} 


package thesis.compression.multi__tree; 


import j ava.io.*; 
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import java.util.*; 


/ it-k 

* Objects of this class are used to read data from a file. Data is 
read in chunks 

* from 1 to BitString.MAXLEN bits. 

*/ 

public class BitStringReader implements ByteMask { 
private static final int BUF_SIZE = 81; //92 


private static final int DEFAULT_ 

private byte[] buf; 
inStream 

private int endOfBuf; 
private int oldEndOfBuf; 
reset 0 needs it 

private int bufFilled; 

(and overwritten) 

private long peekSizeAdj; 
inaccuracy induced by peekBitString() 
private String filename; 
opened / reopened 

private FileInputStream inStream; 
private int n; 

read 

private int bytePos; 

buf 

private int bitPos; 

byte 

private Bitstring scratch; 
object creation 

private long numDecoded; 
this reader 

private long decodeLimit; 
decode with this reader 


= 8 ; 

// holds bytes read in from 

// first unusable byte in buf 
// remembers endOfBuf in case 

// # times this buf was filled 

// corrects 'bits read' 

// name of the file to be 

// underlying data stream 
// default Bitstring length to 

// byte marker within current 

// bit marker within current 

// helps avoid unnecessary 

// # Huffman codes decoded with 

// max # Huffman codes to 


/** 

* Constructs an instance that reads from file filename and uses n 
as the 

* default number of bits to read for calls to readBitString(). 

*/ 

public BitStringReader(String filename, int n) 
throws 

Fi1eNotFoundException, 
lOException, 

LengthOutOfBoundsException 

{ 

buf = new byte[BUF_SIZE]; 

inStream = new FileInputStream(filename); 
this.filename = filename; 
setN(n); 

endOfBuf = inStream.read(buf); 
oldEndOfBuf = endOfBuf; 
bufFilled = bytePos = 0; 
bitPos = 7; 

scratch = new Bitstring(); 


67 




numDecoded = peekSizeAdj = 0; 
decodeLimit = Long.MAX^VALUE; 

} 

/★* 

* Constructs an instance that reads from file filename and uses 
DEFAULT__N as the 

* default number of bits to read for calls to readBitString(), 
*/ 

public BitStringReader(String filename) 
throws 

FileNotFoundException, 

lOException, 

LengthOutOfBoundsException 

{ 

this(filename, DEFAULT_N); 

} 

/★* 

* Changes the number of bits read by a call to readBitString() 
from the current 

* value to n. 

* 

* ©throws LengthOutOfBoundsException ifn<0orn> 

BitString.MAXLEN. 

V 

public final void setN(int n) throws LengthOutOfBoundsException { 
if (n < 0 I 1 n > Bitstring,MAXLEN) { 

throw new LengthOutOfBoundsException(); 

} 


this.n = n; 

} 

/** Sets the number of BitStrings readable from a file using 
decodeBitString0 . */ 

public final void setDecodeLimit(long limit) { 
decodeLimit = limit; 

} 


/** Returns the number of bits read by this BitStringReader. */ 
public final long bitsReadO { 

return (long)bufFilled * (long)BUF_SIZE * 8L + (long)bytePos 

8L + 


} 


7L - (long)bitPos + (long)peekSizeAdj * 8L; 


★ 


/** Returns the filename of the file being read by this 
BitStringReader. */ 

public final String filename() { 

return filename; 

} 


^ -k-k 

* Returns a Bitstring that represents the next n bits of the file 

being 
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* read by this BitStringReader. n is an already set instance 
variable of 

* this BitStringReader. Thus, this is the default read operation 
which is useful 

* if you need to read fixed-length bit strings from a source 
file. Note: if there 

* are only x bits left in the file where x < n then a Bitstring 
of length x will be 

* returned. Thus, a returned Bitstring of length < n is a signal 
that the EOF has 

* been reached. Further calls to this method after EOF will 
return BitStrings of 

* length 0. 

*/ 

public final Bitstring readBitString() throws lOException { 
return readBitString(n); 

,} 

^ •k'k 

* Returns a Bitstring that represents the next len bits of the 
file being 

* read by this BitStringReader. Note: if there are only x bits 
left in the file 

* where x < len then a Bitstring of length x will be returned. 
Thus, a returned 

* Bitstring of length < len is a signal that the EOF has been 
reached. Further calls 

* to this method after EOF will return BitStrings of length 0. 

*/ 

public Bitstring readBitString(int len) 

throws 

lOException, 

LengthOutOfBoundsException' 

{ 

int bitsNeeded = ien; 

Bitstring temp = scratch; 

Bitstring bs = new Bitstring((short)bitsNeeded); 

while (bitsNeeded >0) { 

if (bytePos == endOfBuf) { 

// at the end of the current buffer 
endOfBuf = inStream.read(buf); 
if (endOfBuf == -1) { 

endOfBuf = bytePos; 

bs.relnit(len - bitsNeeded, bs.bitPatternO) ; 
return bs; 

} else { 

bufFilled++; 
bytePos = 0; 

} 

} else if (bitsNeeded > bitPos) { 

// need all of the current byte 
int pieceLen = bitPos + 1; 
bs.iShift(pieceLen, false); 

temp. relnit (pieceLen, buf [bytePos] & MASK__R [bitPos] ) ; 
bs.or(temp); 
bitsNeeded -= pieceLen; 
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bytePos++; 
bitPos = 7; 

} else { 

// need only some of the current byte 
bs.IShift(bitsNeeded, false)/ 

int t = (buf[bytePos] & MASK_R[bitPos]) >>> (bitPos - 

bitsNeeded + 1); 

temp.relnit(bitsNeeded, t) ; 
bs.or(temp); 
bitPos -= bitsNeeded; 
bitsNeeded =0; 

} 

} 

return bs; 

} 

/** 

* This method is identical in functionality to readBitString(len) 
except that the 

* the underlying state of this BitStringReader remains unchanged 
after each call. 

* Thus, multiple calls to this method are guaranteed to return 
the same result 

* so long as no intermediate calls to readBitString or 
decodeBitString are made. 

* This method is useful to determine what is coming up in the 
input stream without 

* actually advancing the input stream pointer. 

*/ 

public Bitstring peekBitString(int len) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

int bitsNeeded = len; 

Bitstring temp =•scratch; 

Bitstring bs = new Bitstring ((short)bitsNeeded); 
int bitPos = this.bitPos; 
int bytePos = this.bytePos; 
int endOfBuf = this.endOfBuf; 

while (bitsNeeded > 0) { 

if (bytePos == endOfBuf) { 

//at the end of the current buffer 
int bytesFromOldBuf = this.endOfBuf - this.bytePos; 
System.arraycopy(buf, this.bytePos, buf, 0, 
bytesFromOldBuf); 

endOfBuf = inStream.read(buf, bytesFromOldBuf, BUF_SIZE 
- bytesFromOldBuf) + 

bytesFromOldBuf; 
bufFilled++; 

peekSizeAdj -= (BUF__SIZE - this.bytePos) ; 

if (endOfBuf < bytesFromOldBuf) { 

// no more bytes to read from source file 
this.endOfBuf = bytesFromOldBuf; 
this.bytePos = 0; 
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bs.relnit(len - bitsNeeded, bs.bitPattern()); 
return bs; 

} else { 

// 

this.endOfBuf = endOfBuf; 
this.bytePos = 0; 
bytePos = bytesFromOldBuf; 

} 

} else if (bitsNeeded > bitPos) { 

// need all of the current byte 
int pieceLen = bitPos + 1; 
bs.IShift(pieceLen, false); 

temp.relnit (pieceLen, buf [bytePos] & MASK__R [bitPos] ) 

bs.or(temp); 

bitsNeeded -= pieceLen; 

bytePos++; 

bitPos = 7; 

} else { 

// need only some of the current byte 
bs.IShift(bitsNeeded, false); 

int t = (buf[bytePos] & MASK_R[bitPos]) >>> (bitPos 

bitsNeeded + 1) ; 

temp.relnit(bitsNeeded, t); 
bs.or(temp); 
bitPos “= bitsNeeded; 
bitsNeeded =0; 

} 

} 

return bs; 

} 

y * 

* Reads lenOfShortestHufCode bits from the input stream and 
checks to see if this 

* key is found in decodingMap. If it is then a Bitstring 
representing the value 

* mapped to by the key is returned. Else, another bit is 
concatenated to the 

* current key and another lookup in decodingMap is performed, 
the new key 

* is found in decoding map then a Bitstring representing the 
value mapped to by 

* the new key is returned. This process continues untill a 
Bitstring is returned, 

* the key exceeds BitString.MAXLEN (which throws a' 
LengthOutOfBounds Exception), or 

* there are no more bits to read from the input stream (which 
throws an 

* AssertionException). 

*/ 

public final Bitstring decodeBitString(HashMap decodingMap, 

int lenOfShortestHufCode) 

throws 

lOException 

{ 

if (numDecoded == decodeLimit) { 
return new Bitstring(); 


If 
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} 

Bitstring hufCode = readBitString(lenOfShortestHufCode); 
Bitstring aBit; 

while (IdecodingMap.containsKey(hufCode)) { 
aBit = readBitString(1) ; 

Assertion.assert(aBit.length() == 1); 
hufCode.concat(aBit); 

} 

nuTnDecoded++ ; 


// returning the VALUE that hufCode maps to in the decoding map 
(its class index) 

// reusing hufCode for return value to avoid object creation 
hufCode.relnit((Bitstring)decodingMap.get(hufCode)); 
return hufCode; 

} 


// PRE: 0 
// POST: 
// 

// 

// 

// 

of the source 


< nBound <= BitString.MAXLEN 
returns either, 

1) a Lyndon word of length <= nBound 

2) a contiguous string of O’s of length <= nBound 

3) a contiguous string of I's of length == nBound 

4) any Bitstring of length <= nBound iff its the tail 
file 


// 5) a Bitstring of length 0 iff no bits remain in the 

source file 

public Bitstring readLyndonWord(int nBound) 

throws 

lOException 


{ 


Bitstring sourceBits = peekBitString(nBound); 
if (sourceBits.length() < nBound) { 

// remainder bits, case 4 or 5 

return readBitString(sourceBits.length() ) ; 

} 


int returnLen = 0; 
int lenSoFar = 1; 

int aBit = sourceBits.IShift(false); 

if (aBit == 0) { 

// a leading 0, case 2 
aBit = sourceBits.IShift(false); 
while (aBit == 0 && lenSoFar < nBound) { 
lenSoFar++; 

aBit = sourceBits.IShift(false); 

} 

returnLen = lenSoFar; 

} else { 

// a leading 1, case 1 or 3 
aBit = sourceBits.IShift(false); 
while (aBit == 1 && lenSoFar < nBound) { 
lenSoFar++; 

aBit = sourceBits.IShift(false); 

} 


if (lenSoFar == nBound) { 
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// case 3; 

returnLen = lenSoFar; 

} else { 

// case 1 

int numLeadls = lenSoFar++; 

returnLen = lenSoFar; 

int numNonLeadls = 0; 

aBit = sourceBits.IShift(false) ; 

while (numNonLeadls < numLeadls && lenSoFar < nBound) 

{ 

lenSoFar++; 
if (aBit == 0) { 

returnLen = lenSoFar; 
numNonLeadls = 0; 

} else { 

numNonLeadls++; 

} 

aBit = sourceBits.IShift(false); 

} 

} 

} 

return readBitString(returnLen); 

} 

/** Resets the input stream of this BitStringReader to the first 
bit in the file. */ 

public void reset() throws lOException { 
if (bufFilled > 0) { 

inStream.close 0; 
inStream = null; 

System.runFinalization{); 

System.gc () ; 

inStream = new FileInputStream(filename); 
endOfBuf = oldEndOfBuf = inStream.read(buf); 
bufFilled = 0; 
peekSizeAdj = 0; 

} else { 

endOfBuf = oldEndOfBuf; 

} 

bytePos = 0; 
bitPos = 7; 
nuitiDecoded = 0; 

} 

/** 

* Closes this BitStringReader. 

*/ 

public void close() throws lOException { 
inStream. close 0 ; 

} 
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package thesis.compression.multi_tree; 


import j ava.io.*; 
import j ava.uti1.*; 

/** 

* Objects of this class are used to write variable length BitStrings 
to a file. 

* Upon closing the BitStringWriter any bits beyond the last byte 
boundary will be 

* padded with enough 0 bits as to make the total number of bits 
written to the 

* target file divisible by 8. 

*/ 

public class BitStringWriter implements ByteMask { 

private static final int BUF_SIZE = 8192; 
private byte[] buf; 

private FileOutputStream outStream; 

private int bufFilled; // # times buf was 

filled 

private int bytePos; 
private int bitPos; 
private Bitstring scratch; 

/** Constructs a BitStringWriter object to write to file 'file'. */ 
public BitStringWriter(String file) throws FileNotFoundException { 
buf = new byte[BUF_SIZE]; 
outStream = new FileOutputStream(file); 
bufFilled = 0; 
bytePos = 0; 
bitPos = 7; 

scratch = new Bitstring(); 

} 

/** Returns the number of bits written by this BitStringWriter. */ 
public long bitsWroteO { 

return (long)bufFilled * (long)BUF_SIZE * 8L + (long)bytePos * 
SL + 7L “ (long)bitPos; 

} 

/** Writes Bitstring bStr to the output stream */ 
public void writeBitString(Bitstring bStr) throws lOException { 
Bitstring bs = scratch; 
bs.relnit(bStr); 
while (bs.length 0 > 0) { 

if (bytePos == BUF_SIZE) { 

// buffer is full 

outStream.write(buf); 

Arrays.fill(buf, (byte)0); 
bytePos = 0; 
bufFilled++; 

} else if (bs.length0 > bitPos) { 

// need all of the current byte 
int bitsAvail = bitPos + 1; 
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byte b = (byte)(bs.bitPattern() >>> bs.length() - 

bitsAvail); 

buf[bytePos] |= b; 

bs.setLength((short) (bs.length0 - bitsAvail)); 

bytePos++; 

bitPos = 7; 

} else { 

// need only some of the current byte 

byte b = (byte)(bs.bitPattern() << bitPos - bs.lengthO 

+ 1 ) ; 

buf[bytePos] |= b; 
bitPos “= bs.lengthO; 
bs.setLength((short)0); 

} 

} 

} 

/** 

* Convenience method that writes all the BitStrings of 
Bitstring[] bsArr to 

* the output stream in index order 
*/ 

public void writeBitString(Bitstring[] bsArr) throws lOException { 
for (int i = 0; i < bsArr.length; i++) { 

writeBitString(bsArr[i]); 

} 

} 

* Closes this BitStringWriter. 

*/ 

public void close () throws lOException { 

outStream.write(buf, 0, bytePos + (bitPos ==7?0:1)); 
outStream.flush 0; 
outStream.close 0; 
outStream = null; 

System.runFinalization 0; 

System.gc(); 

} 

} 


package thesis.compression.multi_tree; 


public interface ByteMask { 

public static final int[] MASK_R = {OxOOOOOOOl, 0x00000003, 
0x00000007, OxOOOOOOOF, 

OxOOOOOOlF, OxOOOOOOSF, 

0X0000007F, OxOOOOOOFF}; 

public static final int [] MASK__L = {OxOOOOOOFF, OxOOOOOOFE, 
OxOOOOOOFC, OxOOOOOOFS, 
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OxOOOOOOFO, OxOOOOOOEO, 

OXOOOOOOCO, 0x00000080}; 

} 


package thesis.compression.huffman; 

import java.util.*; 
import java.io.*; 

import thesis.compression.multi_tree; 

/ * * 

* An instance of this class compresses a sourceFile using standard 
Huffman encoding 

* and n~bit input symbols. 

*/ 

public class HuffmanCompressionManager { 

HashMap freqTable; 

BitStringReader sourceFile; 

BitStringWriter targetFile; 
int n; 

Bitstring remainder; 

Bitstring offset; 


public static void main(String[] args) { 

String usage = new String("USAGE: java 
HuffmanCompressionManager " + 

"<sourceFilename> <targetFilename> <n> <offset>"); 

try { 

if (args.length != 4) { 

throw new ArrayIndexOutOf BoundsException () ; 

} 

int n = Integer.parseint(args[2]); 
int offset = Integer.parseint(args[3]) ; 

BitStringReader sourceFile = new BitStringReader(args[0], 

n) ; 

BitStringWriter targetFile = new BitStringWriter(args[1]); 
HuffmanCompressionManager hem = 

new Huf fmanCompressionManager(sourceFile, targetFile, 

n, offset); 

hem.compress(); 

} catch (NumberFormatException arg30r4NotInt) { 

System.out.println(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println ("1 < n <= " + BitString.MAXLEN + " AND 
0 <= offset < n"); 

} catch (Exception e) { 

e.printStackTrace (new PrintStream(System.out)); 

} finally { 
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System.exit(1); 

} 

} 

public HuffmanCompressionManager(BitStringReader sourceFile, 

BitStringWriter targetFile, 
int n, 

int offsetLen) 

throws 

lOException, 

LengthOutO fBounds Exception 

{ 

// bounds check 

if (n < 2 I I n > BitString.MAXLEN | | offsetLen < 0 | | offsetLen 

>= n) { 

throw new LengthOutOfBoundsException(); 

} 

// initialize instance variables 

this.n = n; 

sourceFile.setN(n); 

this.sourceFile = sourceFile; 

this.targetFile = targetFile; 

offset * sourceFile.readBitString(offsetLen) ; 
remainder = new BitStringO; 
freqTable * new HashMapO; 

} 

public void compress 0 
throws 
lOException 
{ 

buildFreqTable(sourceFile, freqTable, n, remainder); 
compressSojrceFile(targetFile, sourceFile, freqTable, n, 
remainder, offset ; 

} 

public final long sizeAfterO { 

return taraetFile.bitsWrote(); 

} 

protected void buiIdFreqTable( /* IN */ BitStringReader 
sourceFile, 

/* OUT */ HashMap freqTable, 

/* IN */ int n, 

/* OUT */ Bitstring remainder) 

throws 

lOException 

{ 

Bitstring bs = sourceFile.readBitString(); 
while (bs.length 0 == n) { 

if (freqTable.containsKey(bs) ) { 

( (VonNoymanNode) (freqTable.get (bs) )) .incrFreqO ; 

} else { 

freqTable.put(bs, new VonNoymanNode(bs)); 

} 

bs = sourceFile.readBitString{); 
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} 


} 

remainder.relnit(bs); 


protected void compressSourceFile{/* 
targetFile, 

/* 

sourceFile, 

/* 

/* 

/* 

/* 

throws 

lOException 

{ 


OUT */ 

IN */ 

IN/OUT */ 
IN */ 

IN */ 

IN */ 


BitStringWriter 

BitStringReader 

HashMap freqTable, 
int n. 

Bitstring remainder. 
Bitstring offset) 


// write compression header 

writeCompressionHeader(targetFile, sourceFile, n, freqTable, 
remainder, offset); 


HashMap encodingMap = freqTable; 
sourceFile.reset{); 

sourceFile.readBitString(offset.length0); // throw away 

offset 


// write sourceFile to targetFile in compressed form 
Bitstring bs = sourceFile.readBitString(); 
while (bs.length() == n) { 

bs = (Bitstring)encodingMap.get(bs); 
targetFile-writeBitString(bs); 
bs = sourceFile.readBitStringO ; 

} 

Assertion.assert(remainder.equals(bs)); 
targetFile.close(); 

} 


protected void writeCompressionHeader (/* 

targetFile, 

/* 

sourceFile, 

/* 

/* 

freqTable, 

/* 

remainder, 

/* 

offset) 

throws 

lOException 

{ 


OUT */ BitStringWriter 

IN */ BitStringReader 

IN */ int n, 

IN/OUT */ HashMap 

IN ^/ Bitstring 

IN */ Bitstring 


// first entries in header of target file 
targetFile.writeBitString(new Bitstring((short)5, n)); 
targetFile.writeBitString(new Bitstring((short)5, 
offset.length()) ) ; 

targetFile.writeBitString(offset); 
targetFile.writeBitString(new Bitstring((short)5, 
remainder.length())); 

targetFile.writeBitString(remainder); 
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// convert the HashMap from a freqency table to an encoding map 
and write 

// header info for the Huffman tree 
processTree(freqTable, targetFile); 

// write numBitStringsRead as last entry in header 
long numBitStringsRead = (sourceFile.bitsRead() - 
offset.length0) / n; 

Assertion.assert(numBitStringsRead <= Integer.MAX_VALUE); 
targetFile.writeBitString(new Bitstring{(short)32, 

(int)numBitStringsRead)); 

System.out.printlnC’****** HUFFMAN HEADER SIZE FOR ” + 
sourceFile.filename() + 

" with N = " + n + " is ” + (targetFile.bitsWrote() / 8) + 

” bytes ******”); 

} 

protected void processTree(/* IN/OUT */ HashMap freqTable, 

/* OUT */ BitStringWriter 

targetFile) 

throws 

lOException 

{ 

// if the freq table is empty write minimal header info and 

return 

if (freqTable.isEmptyO ) { 

targetFile.writeBitString(new Bitstring((short)5, 0)); 
return; 

} 

// order all the VonNoymanNodes of this table based on 
frequency 

TreeSet freqOrderedVNN = new TreeSet(freqTable.values()); 

// build a frequency ordered bit string list for the 
compression header 

Bitstring [] freqOrderedBitStringList = new 
Bitstring [freqOrderedVNN.size ()]; 

Iterator it = freqOrderedVNN.iterator(); 
int i = 0; 

while (it.hasNext0) { 

freqOrderedBitStringList[i++] = 

((VonNoymanNode)it-next()).index; 

} 

// build leafs at level list for the compression header 
// NOTE: this algorithm trashes its TreeSet parameter AND the 
underlying 

// HashMap upon which it is based 
Bitstring [] leafsAtLevelList = 

VonNoymanNode. vonNoymanAlgorithm (f reqOrderedVNN) ; 

// write this tree’s portion of the compression header 
Bitstring leafsPerLevel = new Bitstring((short)5, 
leafSAtLevelList[0].length()); 


79 



Bitstring numLevels = new Bitstring((short)5, 
leafsAtLevelList.length); 

targetFile.writeBitString(leafsPerLevel); 
targetFile.writeBitString(numLevels); 
targetFile.writeBitString(leafsAtLevelList); 
targetFile.writeBitString(freqOrderedBitStringList)/ 

// build frequency ordered list of Huffman codes 
Bitstring[] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 

// reuse the trashed HashMap as a (class index ---> Huffman 
code) encoding map 

HashMap bitStringToHuffmanCodeMap = freqTable; 
for (int j = 0; j < freqOrderedBitStringList.length; j++) { 

bitStringToHuffmanCodeMap.put( 
freqOrderedBitStringList[j], 
freqOrderedHuffmanCodes[j] ) ; 

} 

} 

} 


package thesis.conpression.huffman2; 

import j ava.utiI.•; 
import j ava.lo.•; 

import thesis . corp ress ion .multi__tree . * ; 
import thesis.compress ion.huffman.*; 

y ★ * 

* This class is identical to HuffmanCompressionManager except that it 
uses a slightly 

* more efficient header encoding format. The performance difference 
between the two 

* techniques is t'.'pically insignificant (<0.1%). 

V 

public class HuffnanCorpressionManager2 extends 
HuffmanCompresslonManager { 

public static void main(String[] args) { 

String usage « new String("USAGE: java 
HuffmanCompressionManager2 " + 

"<sourceFilename> <targetFilename> <n> <offset>"); 


try { 

if (args.length 1= 4) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

int n = Integer.parseint(args[2]); 
int offset = Integer.parseint(args[3]); 

BitStringReader sourceFile = new BitStringReader(args[0], 

BitStringWriter targetFile = new BitStringWriter(args[1]); 
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HufftnanCompressionManager2 hcm2 = 

new HuffmanCompressionManager2(sourceFile, targetFile, 

n, offset); 

hcm2.compress(); 

} catch (NumberFormatException arg30r4NotInt) { 

System.out.println(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println("1 < n <= ” + BitString.MAXLEN + ” AND 
0 <= offset < n"); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)) ; 

} finally { 

System.exit(1); 

} 

} 

public HuffmanCompressionManager2(BitStringReader sourceFile, 

BitstringWriter targetFile, 
int n, 

int offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super(sourceFile, targetFile, n, offsetLen); 

} 

protected void processTree(/* IN/OUT */ HashMap freqTable, 

/* OUT */ BitStringWriter 

targetFile) 

throws 

lOException 

{ 

// if the freq table is empty write minimal header info and 

return 

if (freqTable. isEmptyO ) { 

targetFile.writeBitString(new Bitstring((short)5, 0)) ; 
return; 

}■ 

// order all the VonNoymanNodes of this table based on 
frequency 

TreeSet freqOrderedVNN = new TreeSet(freqTable.values()); 

// build a frequency ordered jWord list for the compression 

header 

Bitstring[] freqOrderedJWordList = new 
Bitstring[freqOrderedVNN.size 0]; 

Iterator it = freqOrderedVNN.iterator(); 
int i = 0; 

while (it.hasNext0) { 

freqOrderedJWordList[i++] = 

((VonNoymanNode)it.next()).index; 

} 
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header 


// build and trim leafs at level list for the compression 


// NOTE: the VonNoyman algorithm trashes its TreeSet parameter 

AND 

// the underlying HashMap upon which it is based) 

Bitstring [] leafsAtLevelList = 

VonNoymanNode.vonNoymanAlgorithm(freqOrderedVNN); 

Bitstring [] trimedLeafsAtLevelList = 
trimEmptyLeadLevels(leafsAtLevelList); 

int firstLevel = leafsAtLevelList.length - 
trimedLeafSAtLevelList.length; 

// write this tree’s portion of the compression header 
Bitstring numNonEmptyLevels = new Bitstring((short)5, 
trimedLeafSAtLevelList.length); 

Bitstring firstNonEmptyLevel = new Bitstring((short)5, 
firstLevel); 

Bitstring leafsPerLevel = new Bitstring((short)5, 
trimedLeaf SAtLevelList [0] .lengthO) ; 

targetFile.writeBitString(numNonEmptyLevels); 
targetFile.writeBitString(firstNonEmptyLevel); 
targetFile.writeBitString(leafsPerLevel); 
targetFile.writeBitString(trimedLeafsAtLevelList); 
targetFile.writeBitString(freqOrderedJWordList); 

// build frequency ordered list of Huffman codes 
Bitstring [] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 

Assertion.assert(freqOrderedJWordList.length == 
freqOrderedHuffmanCodes.length); 

// reuse the trashed HashMap as a (j-word --> Huffman code) 
encoding map 

HashMap JWordToHuffmanCodeMap = freqTable; 
for (int k = 0; k < freqOrderedJWordList.length; k++) { 

JWordToHuffmanCodeMap.put(freqOrderedJWordList[k], 
freqOrderedHuffmanCodes[k]); 

} 

} 

private final Bitstring[] trimEmptyLeadLevels(Bitstring[] 
leafsAtLevelList) { 

int numNonEmptyLevels = leafsAtLevelList.length; 
int i =: 0; 

while (leafSAtLevelList[i].bitPattern{) == 0) { 

numNonEmptyLevels“-; 
i++; 

} 

Bitstring[] newLeafsAtLevelList = new 
Bitstring[numNonEmptyLevels]; 

System.arraycopy(leafSAtLevelList, i, newLeafsAtLevelList, 0, 
numNonEmptyLevels); 

return newLeafsAtLevelList; 

} 
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} 


package thesis.compression.huffman; 

import java.util.*; 

import j ava.io.*; 

import org.omg.CORBA.IntHolder; 

import thesis.compression.multi_tree.*; 

/*★ 

* An instance of this class decompresses a file compressed by an 
instance of 

* HuffmanCompressionManager. 

*/ 

public class HuffmanDecompressionManager { 

private BitStringWriter targetFile; 
private BitStringReader sourceFile; 
private HashMap decodingMap; 
private int n; 

private BitString remainder; 
private Bitstring offset; 


public static void main(String[] args) { 

String usage = new String("USAGE: java 
HuffmanDecompressionManager " + 

”<sourceFilename> <targetFilename>”); 


try { 

if (args.length 1 - 2 ) { 

throw new ArrayIndexOutOfBoundsExcept!on(); 

} 

BitStringReader sourceFile = new BitStringReader(args[0] ) ; 
BitStringWriter targetFile = new BitStringWriter(args[1] ) ; 
new HuffmanDecompressionManager(sourceFile, targetFile); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.printIn(usage); 

} catch (Exception e) { 

e .printStackTrace (new PrintStream (System.out')) ; 

} finally { 

System.exit(1) ; 

} 

} 

piiblic HuffmanDecompressionManager(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

this.sourceFile = sourceFile; 
this.targetFile = targetFile; 
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n = (sourceFile.readBitString(5)).bitPattern(); 
if (n == 0) { 
n = 32; 

} 

int lenOffset = (sourceFile.readBitString(5)).bitPattern(); 
offset = sourceFile.readBitString(lenOffset); 
int lenRem = (sourceFile.readBitString(5)).bitPattern()/ 
remainder = sourceFile.readBitString(lenRem); 
decodingMap = new HashMapO; 

decompressSourceFile(targetFile, sourceFile, n, decodingMap, 
remainder, offset); 

} 


protected void 

decompressSourceFile{/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int n. 


/* 

OUT 

*/ 

HashMap decodingMap, 


/* 

IN 

*/ 

Bitstring remainder. 


/* 

IN 

*/ 

Bitstring offset) 


throws 

lOException 

{ 

targetFile.writeBitString(offset); 

// build decoding map (Huffman code ==> original bit string) 
// remember length of shortest Huffman code 
IntHolder lengthOfShortHufCode = new IntHolderO; 
buildDecodingMap(sourceFile, n, decodingMap, 
lengthOfShortHufCode); 

int lenOfShortHufCode = lengthOfShortHufCode.value; 

long numBitStringsCompressed = 

(sourceFile.readBitString(32)).bitPattern(); 

sourceFile.setDecodeLimit(numBitStringsCompressed); 
sourceFile.setN(n); 

Bitstring bs = sourceFile.decodeBitString(decodingMap, 
lenOfShortHufCode); 

while (bs.length 0 > 0) { 

targetFile.writeBitString(bs); 
bs = sourceFile.decodeBitString(decodingMap, 
lenOfShortHufCode); 

} 


sourceFile.close(); 

targetFile.writeBitString(remainder); 
targetFile.close(); 


protected void buildDecodingMap(/* IN 
sourceFile, 


/* IN 
/* OUT 


*! BitStringReader 
*/ int n, 

*/ HashMap decodingMap, 
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/* OUT */ IntHolder 


lenOfShortcodeInTable) 
throws 
lOException 
{ 

// build leafsAtLevelList for getHuffmanCodes method 
int lenLevel = (sourceFile.readBitString(5)).bitPattern{)/ 
int numLevels = (sourceFile.readBitString(5)).bitPattern(); 
Bitstring [] leafsAtLevelList = new Bitstring[numLevels]; 
for (int i = 0; i < numLevels; i++) { 

leafsAtLevelList [i] = sourceFile.readBitString(lenLevel); 

} 

// getHuf fmanCode s 

Bitstring [] freqOrderedHuffmanCodes = 

VohNoymanNode.getHuffmanCodes (leafsAtLevelList); 

// fill decodingMap with (Huffman code unencoded bit 
string) mapping 

int numLeafs = freqOrderedHuffmanCodes.length; 
for (int i = 0; i < numLeafs; i++) { 

decodingMap.put(freqOrderedHuffmanCodes[i] , 
sourceFile.readBitString(n)); 

} 

// find length of shortest Huffman code 
for (int i - 0; i < numLevels; i++) { 

if (leafsAtLevelList [i] .bitPattern0 I- 0) { 
lenOfShortCodelnTable.value = i; 
return; 

} 

} 

} 

} 


package thesis.compression.huffman2; 

import java .util. *■; 
import j ava.io.*; 

import org.omg.CORBA.IntHolder; ' 

import thesis.compression.multi_tree.*; 
import thesis.compression.huffman.*; 

/** 

* Decompresses files compressed using HuffmanCompressionManager2. 
*/ 

public class HuffmanDecompressionManager2 extends 
Huf fmanDecompre s sionManager { 

public static void main(String[] args) { 
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string usage = new String("USAGE: java 
HuffmanDecompressionManager2 " + 

"<sourceFilename> <targetFilename>"); 

try { 

if (args.length 1= 2) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

BitStringReader sourceFile = new BitStringReader(args [0]); 
BitStringWriter targetFile = new BitStringWriter(args [1]); 
new HuffmanDecompressionManager2(sourceFile, targetFile); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public HuffmanDecompressionManager2(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

super(sourceFile, targetFile); 

} 

protected void buildDecodingMap(/* IN */ BitStringReader 
sourceFile, 

/* IN */ int n, 

/* OUT */ HashMap decodingMap, 

/* OUT */ IntHolder 

lenOfShortCodeInTable) 
throws 
lOException 
{ 

int numNonEmptyLevels = 

(sourceFile.readBitString(5)).bitPattern(); 
if (numNonEmptyLevels == 0) { 

return; 

} 

// build leafsAtLevelList for getHuffmanCodes method 
int firstNonEmptyLevel = 

(sourceFile.readBitString(5)).bitPattern(); 

int lenLevel = (sourceFile.readBitString(5)).bitPattern(); 
int numLevels = numNonEmptyLevels + firstNonEmptyLevel; 
Bitstring[] leafsAtLevelList = new Bitstring[numLevels]; 
for (int i = 0; i < numLevels; i++) { 

if (i < firstNonEmptyLevel) { 

leafsAtLevelList [i] = new Bitstring((short)lenLevel, 

0 ) ; 

} else { 

leafSAtLevelList[i] = 
sourceFile.readBitString(lenLevel); 

} 
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} 


// getHuffmanCodes 

Bitstring[] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 

// fill decodingMap with (Huffman code --> unencoded bit 
string) mapping 

int numLeafs = freqOrderedHuffmanCodes.length; 
for (int i = 0; i < numLeafs; i++) { 

decodingMap.put(freqOrderedHuffmanCodes[i] , 
sourceFile.readBitString(n)); 

} 

// find length of shortest Huffman code 
for (int i = 0; i < numLevels; i++) { 

if (leafsAtLevelList [i] .bitPatternO != 0) { 
lenOfShortCodelnTable.value = i; 
return; 

} 

} 

} 


} 


package thesis.compression.multi_tree; 


public class LengthOutOfBoundsException extends RuntimeException { 
public LengthOutOfBoundsException() { 

super("Bitstring length must be between 0 and " + 

Integer.toString(Bitstring.MAXLEN) + " (inclusive) .") ; 

} 

public LengthOutOfBoundsException(String str) { 
super(str); 

} 

} 


package thesis.compression.lyndon; 

import java.util.*; 
import j ava.io.*; 

import thesis.compression.multi_tree.*; 
/** 
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* Preliminary investigation into the use of Lyndon words to build a 
dictionary 

* compression technique. Not used in final thesis. Looks promising 
enough to continue 

* researching. 

*/ 

public class LyndonCompressionManager { 

ArrayList dictionary; 
int[] freq; 

BitStringReader sourceFile; 

BitStringWriter targetFile; 
int nBound; 

BitString remainder; 

// Bitstring offset; 


public static void main(String[] args) { 

String usage = new String("USAGE: java 
HuffmanCompressionManager " + 

''<sourceFilename> <targetFilename> <nBound>") ; 


try { 

if (args.length != 3) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

int nBound = Integer.parseint (args [2]); 

BitStringReader sourceFile = new BitStringReader(args[0]); 
BitStringWriter targetFile = new BitStringWriter(args[1]); 
LyndonCompressionManager 1cm = 

new LyndonCompressionManager(sourceFile, targetFile, 

nBound); 

1cm.compress 0; 

} catch (NumberFormatException argSNotInt) { 

System.out.printIn(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.printIn(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println("1 < nBound <= " + BitString.MAXLEN); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public LyndonCompressionManager(BitStringReader sourceFile, 

BitStringWriter targetFile, 
int nBound) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

// bounds check 

if (nBound < 2 \\ nBound > BitString.MAXLEN) { 
throw new LengthOutOfBoundsException{); 
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} 


// initialize instance variables 
this.nBound = nBound; 
this.sourceFile = sourceFile; 
this.targetFile = targetFile; 
remainder = new Bitstring(); 
dictionary = new ArrayListO; 
freq = new int[100]; 

Arrays.fill(freq, 1); 

} 

public void compress{) 
throws 
lOException 
{ 

loadDictionary(sourceFile, dictionary, nBound, remainder); 
// compressSourceFile(targetFile, sourceFile, freqTable, n, 
remainder, offset); 

} 

public void loadDictionary(BitStringReader sourceFile, 

ArrayList dictionary, 
int liBound, 

Bitstring remainder) 

throws 

lOException 

{ 

int i ; 

Bitstring lynWordl = null; 

BitString lvTiWord2 = sourceFile.readLyndonWord(nBound); 
while <lynWord2.length 0 != 0) { 

1 « dirtionary.indexOfdynWordl); 

1 f (1 . • -1) { 

dic!:ionary. add (lynWordl) ; 

} else ' 

i reg [ i ] 

1yn Word: * 1ynWord2; 

lynWordl « sourceFile.readLyndonWord(nBound); 

} 

remainder.re Init(lynWordl); 
printDictionarv(); 

} 

private void printDictionary() { 
int numBits = 0; 
int size - dictionary.size(); 

int logSize = (int)Math.ceil(Math.log(size) / Math.log(2)); 
Bitstring entry; 

for (int i = 1; i < size; i++) { 

entry = (BitString)dictionary.get(i); 

System.out .print (i + ". " + (i < 10 ? ” " : *"*)); 

System.out .print (entry) ; 

printSpaces(nBound - entry.length() +3); 

System.out.println(freq[i]); 

89 


numBits += (logSize * freq[i]); 


} 


} 

System.out.printIn("remainder: " 
System.out.printIn("source bits: 
System.out.println("target bits: 


+ remainder); 

" + sourceFile.bitsRead{) 
" + numBits); 


private void printSpaces(int num) { 
for (int i = 0; i < num; i+-f-) { 

System.out.print(" ")/ 

} 

} 


MULTITREE COMPRESSION FORMAT 


n 5 

bits 

bits used Bf to represent # leafs f at each level of Huffman tree 5 
bits 

bits used Bv to represent # of levels v in Huffman tree 5 

bits 


leafs 

in 

level 

0 

of 

Huffman 

tree 

Bf 

bits 

leafs 

in 

level 

1 

of 

Huffman 

tree 

Bf 


bits 


leafs in level v - 1 of Huffman tree Bf 

bits 


package thesis.compression.multi_tree; 

import java.util.*; 
import java.io.*; 

/** 

* This class implements the RTA approach presented in the thesis. 
*/ 

public class MultitreeCompressionManager implements StatsCollected { 

protected Necklace neck; 
protected HashMap[] hashes; 
protected BitStringReader sourceFile; 
protected BitStringWriter targetFile; 
protected Bitstring remainder; 
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protected Bitstring offset; 
protected boolean lastTreelnspectable; 
protected long headerSize; 

public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeCompressionManager " + 

"<sourceFilename> <targetFilename> <n> <offset>"); 


try { 

if (args.length 1= 4) { 

throw new ArraylndexOutOfBoxindsException () ; 

} 

int n = Integer.parseint(args[2]); 
int offset = Integer.parseint(args[3]); 

BitStringReader sourceFile = new BitStringReader(args[0] , 

n) ; 

BitStringWriter targetFile = new BitStringWriter(args[l]); 
MultitreeCompressionManager mcm = 

new MultitreeCompressionManager(sourceFile, targetFile, 

n, offset); 

mcm.compress 0 ; 

} catch (NumberFormatException arg30r4NotInt) { 

System.out.println(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println(”1 < n <= " + Bitstring.MAXLEN + " AND 
0 <= offset < n"); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public MultitreeCompressionManager(BitStringReader sourceFile, 

BitStringWriter targetFile, 
int n, 

int offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

// bounds check 

if (n < 2 I I n > Bitstring.MAXLEN || offsetLen < 0 | [ offsetLen 

>= n) { 

throw new LengthOutOfBoundsException(); 

} 

// initialize instance variables 
neck = new Necklace(n); 
this.sourceFile = sourceFile; 
this.targetFile = targetFile; 
sourceFile.setN(n); 

offset = sourceFile.readBitString(offsetLen); 
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remainder = new Bitstring(); 
lastTreelnspectable = false; 
headerSize = -1; 
hashes = new HashMap[n + 1]; 
for (int i = 0; i <= n; i++) { 
hashes[i] = new HashMapO; 

} 

} 

public void compress() 
throws 
lOException 
{ 

buildFregTables{) ; 
compressSourceFile(); 

} 


II ***★★*★★★★★ methods required to implement StatsCollected 
interface *****♦*♦♦*♦ 

public int iBitsO {return neck.rotLength()/} 
public int jBitsO {return neck.indexLength();} 
public int iTreeO {return neck.nO;} 
public int n ( • {return neck.nO;} 

public int lenOffsetO {return of f set. length (); } 

public String sourceFilename() {return sourceFile.filename{);} 

public long scurceFileSize() 
throws 

StatsNotAva 1 1 atieException 

{ 

long Efr « sourceFile.bitsRead{); 
if {sis > of f set. lengthO ) { 

return sts; 

} else • 

thro-, new StatsNotAvailableException () ; 

} 

} 

public long targetFileSize () 
throws 

StatsNotAva 1 1 at 1eException 

{ 

long tfs = targetFile.bitsWrote0; 
if (tfs > 0) { 

return tfs; 

} else { 

throw new StatsNotAvailableException(); 

} 

} 

public Iterator lastTreelnspector() 
throws 

StatsNotAvailableException 

{ 

if (lastTreelnspectable) { 
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return hashes[neck.n()].values{).iterator(); 

} else { 

throw new StatsNotAvailableException(); 

} 

} 

public long headerSizeO 
throws 

StatsNotAvailableException 

{ 

if (headerSize == -1) { 

throw new StatsNotAvailableException(); 

} else { 

return headerSize; 

} 

} 

public int [] treeSizes() 
throws 

StatsNotAvailableException 

{ 

int n = neck.n0; 
if (hashes[n].isEmpty()) { 

throw new StatsNotAvailableException(); 

} else { 

int[] numLeafs = new int [n] ; 
for (int i = 0; i < n; i++) { 

numLeafs[i] = hashes[i].isEmpty() ? 0 : 
hashes [i] .size(); 

} 

return numLeafs; 

} 

} 

public void buildFreqTables() 
throws 
lOException 
{ 

buildFreqTables(sourceFile, neck, hashes, remainder); 
lastTreelnspectable = true; 

} 

public void compressSourceFile() 
throws 
lOException 
{ 

lastTreelnspectable = false; 

compressSourceFile(targetFile, sourceFile, neck, hashes 
remainder, offset); 

} 

Ij *★★★★**★****** end of methods for StatsCollected interfa 


protected void buildFreqTables( /* IN */ BitStringReader 
sourceFile, 
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/* IN */ Necklace neck, 

/* OUT */ HashMap[] freqTables, 

/★ OUT */ Bitstring remainder) 

throws 

lOException 

{ 

short indexLen = {short)neck.indexLength(); 
short rotLen = (short)neck.rotLength(); 
int n = neck.n 0; 

Bitstring bs = sourceFile.readBitString(); 

// prescan entire source file and build n + 1 freqency tables 
while (bs. length 0 :== n) { 

int bitPattern = bs.bitPattern(); 

int classindex = neck.bitStringToIndex(bitPattern); 
int rot = neck.bitStringToRots(bitPattern); 


// reuse Bitstring object to represent class index then 
// throw the class index into freqTable [0...n - 1] 
bs.relnit(indexLen, classindex); 
if (freqTables[rot].containsKey(bs)) { 

( (VonNoymanNode) (freqTables [rot] .get (bs) ) ) .incrFreqO ; 
} else { 

freqTables[rot].put(bs, new VonNoymanNode(bs)); 

} 

// throw rotation into freqTable[n] 

Bitstring r = new Bitstring(rotLen, rot); 
if (freqTables[n].containsKey(r)) { 

( (VonNoymanNode) (freqTables [n] .get (r) ) ) .incrFreqO ; 

} else { 

freqTables[n].put(r, new VonNoymanNode(r)); 

} 


bs = sourceFile.readBitString0; 

} 

remainder.relnit(bs); 

} 

protected void compressSourceFile(/* OUT */ BitStringWriter 

targetFile, 

/★ IN */ BitStringReader 

sourceFile, 

/* IN */ Necklace neck, 

/* IN/OUT */ HashMap[] 

freqTables, 

/* IN */ Bitstring remainder, 

/★ IN */ Bitstring offset) 

throws 

lOException 

{ 

int n = neck.n{); 

// write compression header 

writeCompressionHeader(targetFile, sourceFile, n, freqTables, 
remainder, offset); 
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HashMapE] encodingMap = freqTables; 
sourceFile.reset 0; 

sourceFile.readBitString(offset.length()); // throw away 

offset 


// write sourceFile to targetFile in compressed form 
int indexLen = neck.indexLength(); 
int rotLen = neck.rotLength(); 

Bitstring bsl = sourceFile,readBitString(); 

Bitstring bs2 = new Bitstring(); 

while (bsl.length 0 == n) { 

int bitPattern = bsl.bitPatternO; 

int classindex = neck.bitStringToIndex(bitPattern); 
int rot = neck.bitStringToRots(bitPattern); 


// write Huffman code for rotations to targetFile 
// recycling Bitstring objects to avoid unnecessary object 

creation 

bsl.relnit(rotLen, rot); 

bs2.relnit((Bitstring)encodingMap[n].get(bsl)); 
targetFile.writeBitString(bs2); 


// write Huffman code for class index to targetFile 

// recycling Bitstring objects 

bsl.relnit(indexLen, classindex); 

bs2.relnit((Bitstring)encodingMap[rot].get(bsl)); 

targetFile.writeBitString(bs2); 


bsl = sourceFile.readBitString0; 

} 

Assertion.assert(remainder.equals(bsl)); 
targetFile.close(); 


protected void writeCompressionHeader 
targetFile, 


(/* 

/* 


sourceFile, 


/* 

/* 


OUT */ BitStringWriter 

IN */ BitStringReader 

IN */ int n, 

IN/OUT */ HashMapE] 


freqTables, 

/* IN 

remainder, 

/* IN 

offset) 

throws 

lOException 

{ 

// first entries in header of target file 
targetFile.writeBitString(new Bitstring(5, 
targetFile.writeBitString(new Bitstring(5, 
targetFile.writeBitString(offset); 
targetFile.writeBitString(new Bitstring(5, 
remainder.length())); 

targetFile.writeBitString(remainder); 


*/ Bitstring 
*/ Bitstring 


n) ) ; 

offset.length 0)); 
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// convert each HashMap from a fregency table to an encoding 
map and write 

// header info for each Huffman tree 
for (int i = 0; i <= n; i++) { 

processTree(freqTables[i], targetFile); 

} 

// write numBitStringsRead as last entry in header 
long numBitStringsRead = (sourceFile.bitsRead() - 
offset.length 0) / n; 

Assertion.assert(numBitStringsRead <= Integer.MAX_VALUE); 
targetFile.writeBitString(new Bitstring(32, 

(int)numBitStringsRead)); 

headerSize = targetFile.bitsWrote(); 

} 

protected void processTree(/* IN/OUT */ HashMap freqTable, 

/* OUT */ BitStringWriter 

targetFile) 

throws 

lOException 

{ 

// if the freq table is empty write minimal header info and 

return 

if (freqTable.isEmpty0) { 

targetFile.writeBitString(new Bitstring(10, 0) ) ; 
return; 

} 

// order all the VonNoymanNodes of this table based on 
frequency 

TreeSet freqOrderedVNN = new TreeSet(freqTable.values()); 

// build a frequency ordered class index list for the 
compression header 

Bitstring [] freqOrderedClassIndexList = new 
Bitstring[freqOrderedVNN.size()]; 

Iterator it = freqOrderedVNN.iterator() ; 
int i = 0; 

while (it.hasNext0) { 

freqOrderedClassIndexList[i++] = 

((VonNoymanNode)it.next()).index; 

} 

// build leafs at level list for the compression header 
// NOTE: this algorithm trashes its TreeSet parameter AND the 
underlying 

// HashMap upon which it is based) 

Bitstring [] leafsAtLevelList = 

VonNoymanNode.vonNoymanAlgorithm(freqOrderedVNN) ; 

// write this tree's portion of the compression header 
Bitstring leafsPerLevel = new Bitstring(5, 
leafsAtLevelList[0].length()); 

Bitstring numLevels = new Bitstring(5, 
leafsAtLevelList.length); 
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targetFile.writeBitString(leafsPerLevel); 
targetFile.writeBitString(numLevels); 
targetFile.writeBitString(leafsAtLevelList); 
targetFile.writeBitString(freqOrderedClassIndexList); 

// build frequency ordered list of Huffman codes 
Bitstring [] freqOrderedHuffmanCodes = 

VonNoymanNode .getHuffmanCodes (leafsAtLevelList) ; 

// reuse the trashed HashMap as a (class index Huffman 
code) encoding map 

HashMap classIndexToHuffmanCodeMap = freqTable; 
for (int j = 0; j < freqOrderedClassIndexList.length; j++) { 

classIndexToHuffmanCodeMap.put( 
freqOrderedClassIndexList[j], 
freqOrderedHuffmanCodes[j]); 

} 

} 

} 


package thesis.compression.multi_tree2; 

import java.util.*; 
import java.io.*; 

import thesis.compression.multi_tree.*; 

/ ★* 

* This class implements the ITA approach discussed in the thesis. 

*/ 

public class MultitreeCompressionManager2 implements StatsCollected { 

protected HashMap[] hashes; 
protected BitStringReader sourceFile; 
protected BitStringWriter targetFile; 
protected int iBits; 
protected int jBits; 
protected Bitstring remainder; 
protected Bitstring offset; 
protected int iTree; 

protected boolean lastTreelnspectable; 
protected long headerSize; 

public static void main(String[] args) { 

String usage = "USAGE: java MulitreeCompressionManager2 " + 
"<sourceFilename> <targetFilename> <iBits> <jBits> <offset>"; 

try { 

if (args.length != 5) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

int iBits = Integer.parseint(args[2]); 
int jBits = Integer.parseint(args[3]); 
int offset = Integer.parseint(args [4]); 
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BitStringReader sourceFile = new BitStringReader(args [0])/ 
BitStringWriter targetFile = new BitStringWriter(args [1]); 
MultitreeCompressionManager2 mcm2 = new 
MultitreeCompressionManager2( 

sourceFile, targetFile, iBits, jBits, offset); 
mcm2.compress(); 

} catch (NumberFormatException arg2or3or4NotInt) { 

System.out .println (usage) ,* 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println("1 < iBits + jBits <= " + 

Bitstring.MAXLEN + " AND 0 <= offset < iBits + jBits”); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public MultitreeCompressionManager2(BitStringReader sourceFile, 

BitStringWriter targetFile, 
int iBits, 
int jBits, 
int offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

// bounds check 

if (iBits < 1 I I jBits < 1 | | iBits + jBits > Bitstring.MAXLEN 

II 

offsetLen < 0 || offsetLen >= iBits + jBits) { 
throw new LengthOutOfBoundsException(); 

} 

// initialize instance variables 
this.sourceFile = sourceFile; 
this.targetFile = targetFile; 
this.iBits = iBits; 
this.jBits = jBits; 

offset = sourceFile.readBitString(offsetLen); 
remainder = new Bitstring(); 
iTree = (int)(Math.pow(2, iBits)); 
lastTreelnspectable = false; 
headerSize = -1; 

hashes = new HashMap[iTree + 1]; 
for (int i = 0; i <= iTree; i++) { 

hashes[i] = new HashMap(); 

} 

} 

II *********** methods required to implement StatsCollected 
interface *********** 

public int iBits() {return iBits;} 
public int jBits () {return jBits,-} 
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public int n() {return iBits + jBits;} 

public int iTreeO (return iTree;} 

public int lenOffsetO (return of f set. length ();} 

public String sourceFilename() (return sourceFile.filename 

public long sourceFileSize() 
throws 

StatsNotAvailableException 

{ 

long sfs = sourceFile.bitsRead0; 
if (sfs > offset.length0) { 
return sfs; 

} else { 

throw new StatsNotAvailableException{); 

} 

} 

public long headerSizeO 
throws 

StatsNotAvailableException 

{ 

if (headerSize == -1) { 

throw new StatsNotAvailableException(); 

} else { 

return headerSize; 

} 

} 


public long targetFileSize() 
throws 

StatsNotAvailableException 

{. 

long tfs = targetFile.bitsWrote0 ; 
if (tfs > 0) { 
return tfs; 

} else { 

throw new StatsNotAvailableException(); 

} 

} 

public Iterator lastTreelnspector() 
throws 

S t at sNotAvai1ab1eException 

{ 

if (lastTreelnspectable) { 

return hashes [iTree] .valuesO .iterator () ; 
} else { 

throw new StatsNotAvailableException(); 

} 

} 

public int[] treeSizesO 
throws 

StatsNotAvailableException 

{ 

if (hashes [iTree] . isEmptyO ) { 
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throw new StatsNotAvailableException(); 

} else { 

int[] numLeafs = new int[iTree]; 
for (int i = 0; i < iTree; i++) { 

numLeafs[i] = hashes[i].isEmpty() ? 0 : 
hashes [i] .size() ; 

} 

return numLeafs; 

} 

} 

public void buildFreqTables{) throws lOException { 

buildFreqTables(sourceFile, hashes, iBits, jBits, iTree, 
remainder); 

lastTreelnspectable = true; 

} 

public void compressSourceFile() throws lOException { 
lastTreelnspectable = false; 

compressSourceFile(targetFile, sourceFile, hashes, iBits, 
jBits, iTree, remainder, offset); 

} 

II ************** end of methods for StatsCollected interface 
**************** 


public void compress() 

throws 

lOException 

{ 

buildFreqTables0; 
compressSourceFile(); 


protected void buildFreqTables( 
sourceFile, 


/* 

IN 

*/ 

/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

OUT 

*/ 


BitStringReader 

HashMap[] freqTables, 
int iBits, 
int jBits, 
int iTree, 

Bitstring remainder) 


throws 

lOException 

{ 

int lenWord = iBits + jBits; 

Bitstring peekstring = sourceFile.peekBitString(lenWord); 


// prescan entire source file and build iTree + 1 fregency 

tables 

while {peekstring.length0 == lenWord) { 

Bitstring iWord = sourceFile.readBitString(iBits); 
Bitstring jWord = sourceFile.readBitString(jBits); 

if (freqTables[iWord.bitPattern0].containsKey(jWord)) { 
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( (VonNoymanNode) (freqTables [iWord.bitPattern () ] .get (jWord)) ) .incrFreqO 


} else ( 

freqTables[iWord.bitPattern()].put(jWord, new 
VonNoymanNode (jWord)) ; 

} 

// throw iWord into freqTable[iTree] 
if (freqTables[iTree].containsKey(iWord)) { 

( (VonNoymanNode) (freqTables [iTree] .get (iWord)) ) .incrFreqO ; 

} else { 

freqTables[iTree]-put(iWord, new VonNoymanNode(iWor4)); 

} 


peekstring = sourceFile.peekBitString(lenWord); 

} 

remainder.relnit(peekstring); 

sourceFile.readBitString(peekstring,length0); // read and 

throw away remainder 


// to keep 


bitsRead() accurate 

} 


protected void compressSourceFile(/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN/OUT 

*/ 

HashMap [] 

freqTables, 

/* 

IN 

*/ 

int iBits, 


/* 

IN 

*/ 

int jBits, 


/* 

IN 

*/ 

int iTree, 


/* 

IN 

*/ 

Bitstring remainder. 


/* 

IN 

*/ 

Bitstring offset) 


throws 

lOException 

{ 

writeCompressionHeader(targetFile, sourceFile, iBits, j Bits, 
iTree, freqTables, remainder, offset); 


HashMap[] encodingMap = freqTables; 
structure 

sourceFile.reset(); 


AGAIN 

sourceFile.readBitString(offset.length()); 

offset 


// reusing data 
// read the file 
// throw away 


int lenWord = iBits + jBits; 

Bitstring peekstring = sourceFile.peekBitString(lenWord); 
while (peekstring.length 0 == lenWord) { 

Bitstring iWord = sourceFile.readBitString(iBits); 
Bitstring jWord = sourceFile.readBitString(jBits); 

// write Huffman code for iWord to targetFile 
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targetFile.writeBitString((BitString)encodingMap[iTree].get(iWord)); 

// write Huffman code for jword to targetFile 

targetFile.writeBitString((Bitstring)encodingMap[iWord.bitPattern{)].ge 
t (jWord) ) ; 


peekstring = sourceFile.peekBitString(lenWord); 

} 

Assertion.assert(remainder.equals(peekstring)); 


targetFile.close (); 

} 




protected void writeCompressionHeader(/* OUT 

*/ 

BitStringWriter 

targetFile, 

/* IN 

*/ 

BitStringReader 

sourceFile, 

/* IN 

*/ 

int iBits, 


/* IN 

*/ 

int jBits, 


/* IN 

*/ 

int iTree, 


/* IN/OUT 

*/ 

HashMap [] 

freqTables, 

/* IN 

*/ 

Bitstring 

remainder. 

/* IN 

*/ 

Bitstring 


offset) 

throws 

lOException 

{ 

// first entries in header of target file 

targetFile.writeBitString(new Bitstring((short)5, iBits)); 
targetFile.writeBitString(new Bitstring((short)5, jBits)); 
targetFile.writeBitString(new Bitstring((short)5, 
offset.length())); 

targetFile.writeBitString(offset); 
targetFile.writeBitString(new Bitstring((short)5, 
remainder.length())); 

targetFile.writeBitString(remainder); 

// convert each HashMap from a fregency table to an encoding 
map and write 

// header info for each Huffman tree 
for (int i = 0; i <= iTree; i++) { 

processTree(freqTables[i], targetFile); 

} 

// write numBitStringsRead as last entry in header 
long numBitStringsRead = (sourceFile.bitsRead() - 
offset.length0) / (iBits + jBits); 

Assertion.assert(numBitStringsRead <= Integer.MAX^VALUE); 
targetFile.writeBitString(new Bitstring((short)32, 

(int)numBitStringsRead) ) ; 

headerSize = targetFile.bitsWrote(); 

} 
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protected void processTree (/* IN/OXJT */ HashMap freqTable, 

/* OUT */ BitStringWriter 

targetFile) 

throws 

lOException 

{ 

// if the freq table is empty write minimal header info and 

return 

if (freqTable. isEmptyO ) { 

targetFile.writeBitString(new Bitstring((short)10, 0)); 
return; 

} 

// order all the VonNoymanNodes of this table based on 
frequency 

TreeSet freqOrderedVNN = new TreeSet(freqTable.values()); 

// build a frequency ordered j-word list for the compression 

header 

Bitstring [] freqOrderedJWordList = new 
Bitstring[freqOrderedVNN.size 0]; 

Iterator it = freqOrderedVNN.iterator(); 
int i = 0; 

while (it-hasNext0 ) { 

freqOrderedJWordList[i++] = 

((VonNoymanNode)it.next()).index; 

} 

// build leafs at level list for the compression header 
// NOTE: this algorithm trashes its TreeSet parameter AND the 
underlying 

// HashMap upon which it is based) 

Bitstring [] leafsAtLevelList = 

VonNoymanNode .vonNoymanAlgorithm (freqOrderedVNN) ; 

// write this tree's portion of the compression header 
Bitstring leafsPerLevel = new Bitstring((short)5, 
leafSAtLevelList [0] .length()); 

Bitstring numLevels = new Bitstring((short)5, 
leafsAtLevelList.length); 

targetFile.writeBitString(leafsPerLevel); 
targetFile.writeBitString(numLevels); 
targetFile.writeBitString(leafSAtLevelList); 
targetFile.writeBitString(freqOrderedJWordList); 

// build frequency ordered list of Huffman codes 
Bitstring [] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes (leafsAtLevelList) ; 

// reuse the trashed HashMap as a (j-word Huffman code) 
encoding map 

HashMap JWordToHuffmanCodeMap = freqTable; 
for (int k = 0; k < freqOrderedJWordList.length; k++) { 

JWordToHuffmanCodeMap.put(freqOrderedJWordList[k], 
freqOrderedHuffmanCodes[k]); 

} 


} 
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} 


package thesis.compression.multi_tree3; 

import java.util.*; 
import j ava.io.*; 

import thesis.compression.multi_tree.*; 
import thesis.compression.multi_tree2.*; 

/** 

* This class is identical to MultitreeCompressionManager2 (ITA) 
except that it uses 

* a slightly more efficient header encoding format. The performance 
difference between the two 

* techniques is typically insignificant (<0.1%). 

*/ 

public class MultitreeCompressionManager3 
extends MultitreeCompressionManager2 
implements StatsCollected 
{ 

public static void main(String[] args) { 

String usage « "USAGE: java MulitreeCompressionManager3 " + 
"<sourcer.. Ier4ame> <targetFilename> <iBits> <jBits> <offset>"; 

try ; 

if arqs.length 1= 5) { 

throw new ArrayIndexOutOfBoundsException (); 

int lE.ts «= Integer .parseint (args [2] ) ; 
int ;; E : t s =• Integer .parseint (args [3] ) ; 
int offset = Integer.parselnt(args[4]); 

BitStrIngPeader sourceFile = new BitStringReader(args[0]); 
Bit£:t r ^ngWriter targetFile = new BitStringWriter(args[l]); 
Mult 1 1 ree2orrpressionManager3 mcm3 = new 
MultitreeCorr.presc .onManager3 ( 

sourceFile, targetFile, iBits, jBits, offset); 
mcml . co-rpress () ; 

} catch (NuTr±)erFormatException arg2or3or4NotInt) { 

System.out.println(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 

System.out.println("l < iBits + jBits <= " + 

Bitstring.MAXLEN + " AND 0 <= offset < iBits + jBits"); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public MultitreeCompressionManagerS(BitStringReader sourceFile, 
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BitStringWriter targetFile, 
int iBits, 
int jBits, 
int offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super(sourceFile, targetFile, iBits, jBits, offsetLen); 

} 

protected void processTree(/* IN/OUT */ HashMap freqTable, 

/* OUT */ BitStringWriter 

targetFile) 

throws 

lOException 

{ 

// if the freq table is empty write minimal header info and 

return 

if (freqTable.isEmpty0) { 

targetFile.writeBitString(new Bitstring((short)5, 0)); 
return; 

} 

// order all the VonNoymanNodes of this table based on 
frequency 

TreeSet freqOrderedVNN = new TreeSet(freqTable.values()); 

// build a frequency ordered j-word list for the compression 

header 

Bitstring[] freqOrderedJWordList = new 
Bitstring[freqOrderedVNN.size ()]; 

Iterator it = freqOrdeiredVNN. iterator () ; 
int i = 0; 

while (it.hasNext0) { 

freqOrderedJWordList[i++] = 

((VonNoymanNode)it.next()).index; 

} 

// build and trim leafs at level list for the compression 

header 

// NOTE: the VonNoyman algorithm trashes its TreeSet parameter 

AND 

// the underlying HashMap upon which it is based) 

Bitstring [] leafsAtLevelList = 

VonNoymanNode .vonNoymanAlgorithm (f reqOrderedVNN) ; 

Bitstring [] trimedLeafsAtLevelList = 
trimEmptyLeadLevels(leafsAtLevelList); 

int firstLevel = leafsAtLevelList.length - 
trimedLeafsAtLevelList.length; 

// write this tree's portion of the compression header 
Bitstring nuraNonEmptyLevels = new Bitstring((short)5, 
trimedLeafSAtLevelList.length); 

Bitstring firstNonEmptyLevel = new Bitstring((short)5, 
firstLevel); 
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Bitstring leafsPerLevel = new Bitstring((short)5, 
trimedLeafsAtLevelList[0].length()); 

targetFile.writeBitString(numNonEmptyLevels) ; 
targetFile.writeBitString(firstNonEmptyLevel)/ 
targetFile.writeBitString(leafsPerLevel) ; 
targetFile.writeBitString(trimedLeafSAtLevelList) ; 
targetFile.writeBitString(freqOrderedJWordList); 

// build frequency ordered list of Huffman codes 
Bitstring[] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList) ; 

// reuse the trashed HashMap as a (j-word Huffman code) 
encoding map 

HashMap JWordToHuffmanCodeMap = freqTable; 
for (int k = 0; k < freqOrderedJWordList.length; k++) { 
JWordToHuf fmanCodeMap .put (freqOrderedJWordList [k] , 
freqOrderedHuffmanCodes[k]); 

} 

} 

private final Bitstring [] trimEmptyLeadLevels(Bitstring[] 
leafSAtLevelList) { 

int numNonEmptyLevels = leafsAtLevelList.length; 
int i = 0; 

while (leafsAtLevelList [i] .bitPattern0 == 0) { 

numNonEmptyLevels--; 

i++; 

} 

Bitstring[] newLeafsAtLevelList = new 
Bitstring[numNonEmptyLevels]; 

System.arraycopy(leafsAtLevelList, i, newLeafsAtLevelList, 0, 
numNonEmptyLevels); 

return newLeafsAtLevelList; 

} 

} 


package thesis.compression.multi_tree4; 

import j ava.uti1.*; 
import j ava.io.*; 

import thesis.compression.multi_tree.*; 
import thesis.compression.multi_tree3.*; 

/** 

* This class is identical to MultitreeCompressionManagerS except that 
it uses fixed 

* length iBit codes instead of Huffman codes from an iBit tree. It 
was an experiment 

* to see how much of an effect the iBit tree was having on our 
overall compression 

* results. This code helped us build our theoretical model of ITA. 
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*/ 

public class MultitreeCompressionManager4 
extends MultitreeCompressionManagerS 
implements StatsCollected 
{ 

public static void main(String[] args) { 

String usage = "USAGE: java MulitreeCompressionManager4 " + 
"<sourceFilename> <targetFilename> <iBits> <jBits> <offset>”; 

try { 

if (args.length != 5) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

int iBits = Integer.parseint(args[2]); 
int jBits = Integer.parseint(args[3]); 
int offset = Integer.parseint(args[4]); 

BitStringReader sourceFile = new BitStringReader(args[0]) 
BitStringWriter targetFile = new BitStringWriter(args[1]) 
MultitreeCompressionManager4 mcm4 = new 
MultitreeCompressionManager4( 

sourceFile, targetFile, iBits, jBits, offset); 
mcm4.compress(); 

} catch (NumberFormatException arg2or3or4NotInt) { 
System.out.println(usage); 

} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 
System.out.printIn(usage); 

} catch (LengthOutOfBoundsException paramsOutOfRange) { 
System.out.println(”l < iBits + jBits <= " + 
BitString.MAXLEN + " AND 0 <= offset < iBits + jBits"); 
} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out) ); 

} finally { 

System.exit(1) ; 

} 

} 

public MultitreeCompressionManager4(BitStringReader sourceFile, 

BitStringWriter targetFile, 
int iBits, 
int jBits, 
int offsetLen) 

throws 

lOException, 

LengthOutOfBoundsException 

{ 

super(sourceFile, targetFile, iBits, jBits, offsetLen); 

} 


protected void 
targetFile, 

sourceFile, 

freqTables, 


compressSourceFile(/* OUT 

/* IN 
/* IN/OUT 


/* IN 
/* IN 


*/ BitStringWriter 

*/ BitStringReader 

*/ HashMap[] 

*/ int iBits, 

*/ int jBits, 
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/★IN ★/ int iTree, 

/★IN */ Bitstring remainder, 

/★ IN */ Bitstring offset) 

throws 

lOException 

{ 

writeCompressionHeader(targetFile, sourceFile, iBits, jBits, 
iTree, freqTables, remainder, offset); 


HashMapE] encodingMap = freqTables; 
structure 

sourceFile.reset{) ; 

AGAIN 


sourceFile.readBitString(offset.length()); 

offset 


// reusing data 
// read the file 
// throw away 


int lenWord = iBits + jBits; 

Bitstring peekstring = sourceFile.peekBitString(lenWord); 
while (peekstring.length0 == lenWord) { 

Bitstring iWord = sourceFile.readBitString (iBits); 

Bitstring jWord = sourceFile.readBitString(jBits); 

// write iWord (unencoded) to targetFile 
targetFile.writeBitString(iWord); 

// write Huffman code for jword to targetFile 

targetFile.writeBitString((Bitstring)encodingMap[iWord.bitPattern()].ge 
t (jWord)); 


peekstring = sourceFile.peekBitString(lenWord); 

} 

Assertion.assert^remainder.equals(peekstring)); 
targetFile.close(); 


protected void 

writeCompressionHeader(/* OUT 

*/ 

BitStringWriter 

targetFile, 

/* IN 

*/ 

BitStringReader 

sourceFile, 

/* IN 

*/ 

int iBits, 


/* IN 

*/ 

int jBits, 


/* IN 

*/ 

int iTree, 


/* IN/OUT 

*/ 

HashMap [] 

freqTables, 

/* IN 

*/ 

Bitstring 

remainder. 

/* IN 

*/ 

Bitstring 

offset) 





throws 

lOException 

{ 

// first entries in header of target file 

targetFile.writeBitString(new Bitstring((short)5, iBits)); 
targetFile.writeBitString(new Bitstring((short)5, jBits)); 
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targetFile.writeBitString(new Bitstring((short)5, 
offset.length())); 

targetFile.writeBitString(offset); 
targetFile.writeBitString(new Bitstring((short)5, 
remainder.length())); 

targetFile.writeBitString(remainder); 

// convert each HashMap from a fregency table to an encoding 
map and write 

// header info for each Huffman tree (EXCLUDING the iTree) 
for (int i = 0; i < iTree; i++) { 

processTree(freqTables[i], targetFile); 

} 

// write numWordsRead as last entry in header 

long numWordsRead = (sourceFile.bitsRead() - offset.length()) / 
(iBits + jBits); 

Assertion.assert(numWordsRead <= Integer.MAX_VALUE); 
targetFile.writeBitString(new Bitstring((short)32, 

(int)numWordsRead)); 

headerSize = targetFile.bitsWrote(); 

} 

} 


package thesis.compression.multi_tree; 


import java.util.*; 

import j ava.io.*; 

import org.omg.CORBA.IntHolder; 

/*★ 

* This class decompresses files compressed by the RTA approach 
presented in the thesis. 

*/ 

public class MultitreeDecompressionManager { 

private BitStringWriter targetFile; 
private BitStringReader sourceFile; 
private HashMap[] decodingMaps; 
private int n; 

private Bitstring remainder; 
private Bitstring offset; 


public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeDecompressionManager ” + 

”<sourceFilename> <targetFilename>"); 

try { 

if (args.length 1= 2) { 

throw new ArrayIndexOutOfBoundsException(); 
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} 

BitStringReader sourceFile = new BitStringReader(args[0] ) ; 
BitStringWriter targetFile = new BitStringWriter(args[1] ); 
new MultitreeDecompressionManager(sourceFile, targetFile); 
} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 

public MultitreeDecompressionManager(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException ■ 

{ 

this.sourceFile = sourceFile; 
this.targetFile = targetFile; 

n = (sourceFile.readBitString(5)).bitPattern0; 

int lenOffset = (sourceFile.readBitString(5)).bitPattern(); 

offset = sourceFile.readBitString(lenOffset); 

int lenRem = (sourceFile.readBitString(5)).bitPattern(); 

remainder = sourceFile.readBitString(lenRem); 

decodingMaps = new HashMap[n + 1]; 

for (int i = 0; i <= n; i++) { 

decodingMaps[i] = new HashMap(); 

} 

decompressSourceFile(targetFile, sourceFile, n, decodingMaps, 
remainder, offset); 

} 


protected void 

decompressSourceFile(/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int n. 


/* 

OUT 

*/ 

HashMap [] 

decodingMaps, 

/* 

IN 

*/ 

Bitstring remainder, 


/* 

IN 

*/ 

Bitstring offset) 


throws 

lOException 

{ 

targetFile.writeBitString(offset); 

// build decoding maps (Huffman code ==> class index) 

// remember length of shortest Huffman code in each map 
IntHolder[] lenOfShortHufCode = new IntHolder[n + 1]; 
for (int i = 0; i <= n; i++) { 

lenOfShortHufCode [i] = new IntHolderO; 
buildDecodingMap(sourceFile, n, decodingMaps[i] , 
lenOfShortHufCode[i], i == n); 

} 
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// build decoding table (necklace class index ==> necklace 

class) 

long numClasses = Necklace.numClasses(n); 

Assertion.assert(numClasses <= Integer.MAX_VALUE); 
int[] indexToClass = new int [ (int)numClasses] ; 

Necklace.loadTable_indexToClass(n, indexToClass) ; 

long numBitStringsCompressed = 

(sourceFile.readBitString(32)).bitPattern(); 

sourceFile.setDecodeLimit (2L * numBitStringsCompressed); 
sourceFile.setN(n); 

Bitstring r = null, c = null; 
int rot = -1; 

// get first encoded rotation in sourceFile 
r = sourceFile.decodeBitString(decodingMaps[n] , 
lenOfShortHufCode[n].value); 

// read an encoded rotation and class index from sourceFile 
// write an unencoded bitstring to targetFile 
while (r.length0 >0) { 

rot = r.bitPattern0; 

// c is class index 

c = sourceFile.decodeBitString(decodingMaps[rot], 
lenOfShortHufCode[rot].value); 

// c is class 

c.relnit(n, indexToClass [c.bitPattern() ]) ; 

// c is original unencoded bitstring 1!1 
c.rShift(rot, true); 

// write c to the target file 
targetFile.writeBitString(c); 

// get next encoded rotation in sourceFile 
// a Bitstring of length 0 indicates end of sourceFile 
r = sourceFile.decodeBitString(decodingMaps[n] , 
lenOfShortHufCode[n].value); 

} 

sourceFile.close 0; 

targetFile.writeBitString(remainder); 
targetFile.close 0; 

} 


protected void buildDecodingMap(/* IN */ BitStringReader 
sourceFile, 

/* IN */ int n, 

/* OUT */ HashMap decodingMap, 
/* OUT */ IntHolder 

lenOfShortCodeInTable, 

/* IN */ boolean isRotMap) 

throws 

lOException 
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// build leafsAtLevelList for getHuffmanCodes method 
int lenLevel = (sourceFile.readBitString(5)).bitPattern(); 
int numLevels = (sourceFile.readBitString(5)).bitPattern{); 
Bitstring[] leafsAtLevelList = new Bitstring[numLevels]; 
for (int i = 0; i < numLevels; i++) { 

leafsAtLevelList[i] = sourceFile.readBitString(lenLevel); 

} 

// getHuffmanCodes 

Bitstring [] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 

// fill decodingMap with (Huffman code class index) mapping 

int lenindex = isRotMap ? Necklace.rotLength(n) : 

Necklace.indexLength(n); 

int numLeafs = freqOrderedHuffmanCodes.length; 
for (int i = 0; i < numLeafs; i++) { 

decodingMap.put(freqOrderedHuffmanCodes[i], 
sourceFile.readBitString(lenindex)); 

} 

// find length of shortest Huffman code 
for (int i = 0; i < numLevels; i++) { 

if (leafSAtLevelList[i].bitPattern() != 0) { 

lenOfShortCodelnTable.value = i; 
return; 

} 

} 

} 

} 


package thesis.compression.multi_tree2; 


import java.util.*; 

import j ava.io.*; 

import org.omg.CORBA.IntHolder; 

import thesis . compression .multi__tree. * ; 

/★* 

* An instance of this class decompresses files compressed by 
MultitreeCompressionManager2 
*/ 

public class MultitreeDecompressionManager2 { 

private BitStringWriter targetFile; 
private BitStringReader sourceFile; 
private HashMap[] decodingMaps; 
private int iBits; 
private int jBits; 
private int iTree; 
private Bitstring remainder; 
private Bitstring offset; 
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public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeDecompressionManager2 " + 

"<sourceFilename> <targetFilename>"); 


try { 

if (args.length != 2) { 

throw new ArrayIndexOutOfBoundsException{); 

} 

BitStringReader sourceFile = new BitStringReader(args[0]); 
BitStringWriter targetFile = new BitStringWriter (args [1],) ; 
new MultitreeDecompressionManager2(sourceFile, targetFile); 
} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.printIn(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 


public Multitre.eDecompressionManager2 (BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

this.sourceFile = sourceFile; 
this.targetFile = targetFile; 

iBits = (sourceFile.readBitString(5)).bitPattern(); 
jBits = (sourceFile.readBitString(5)).bitPattern{); 
int lenOffset = (sourceFile.readBitString(5)).bitPattern(); 
offset = sourceFile.readBitString(lenOffset); 
int lenRem = (sourceFile.readBitString(5)).bitPattern(); 
remainder = sourceFile.readBitString(lenRem); 
iTree = (int)(Math.pow(2, iBits)); 
decodingMaps = new HashMap[iTree + 1]; 
for (int i = 0; i <= iTree; i++) { 

decodingMaps [i] = new HashMap(); 

} 

decompressSourceFile(targetFile, sourceFile, iBits, jBits, 
iTree, decodingMaps, remainder, offset); 

} 


targetFile, 

sourceFile, 


decodingMaps, 


(/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 

/* 

OUT 

*/ 

/* 

IN 

*/ 

/* 

IN 

*/ 


BitStringWriter 

BitStringReader 

int iBits, 
int jBits, 
int iTree, 

HashMap [] 

Bitstring remainder. 
Bitstring offset) 
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throws 

lOException 

{ 

targetFile.writeBitString(offset); 

// build decoding maps (Huffman code ==> jWord) 

// remember length of shortest Huffman code in each map 
IntHolder[] lenOfShortHufCode = new IntHolder[iTree + 1] ; 
for (int i = 0; i <= iTree; i++) { 

lenOf ShortHuf Code [i] = new IntHolderO; 

buildDecodingMap(sourceFile, iBits, jBits, decodingMaps[i], 
lenOfShortHufCode[i], i == iTree); 

} 

long numBitStringsCompressed = 

(sourceFile.readBitString(32)).bitPattern(); 

sourceFile.setDecodeLimit(2L * numBitStringsCompressed); 
Bitstring iWord = null, jWord = null; 


// get first encoded iWord in sourceFile 
iWord = sourceFile.decodeBitString(decodingMaps[iTree] , 
lenOfShortHufCode[iTree].value); 

// read an encoded iWord and jWord from sourceFile 
// write a decoded iWord and jWord to targetFile 
while (iWord.length 0 >0) { 

jWord = 

sourceFile.decodeBitString(decodingMaps[iWord.bitPattern() ] , 
lenOfShortHufCode[iWord.bitPattern0].value); 

targetFile.writeBitString(iWord); 
targetFile.writeBitString(jWord); 

iWord = sourceFile.decodeBitString(decodingMaps[iTree] , 
lenOfShortHufCode[iTree].value); 

} 


sourceFile.close(); 

targetFile.writeBitString(remainder); 
targetFile.close(); 


protected void buildDecodingMap(/* IN */ BitStringReader 
sourceFile, 

/* IN */ int iBits, 

/* IN int jBits, 

/* OUT */ HashMap decodingMap, 
/* OUT */ IntHolder 

lenOfShortCodeInTable, 

/* IN */ boolean isIWordMap) 

throws 

lOException 

{ 

// build leafsAtLevelList for getHuffmanCodes method 

int lenLevel = (sourceFile.readBitString (5)).bitPattern(); 

int numLevels = (sourceFile.readBitString(5)).bitPattern(); 
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Bitstring[] leafsAtLevelList = new Bitstring[numLevels]; 
for (int i = 0; i < numLevels; i++) { 

leafSAtLevelList [i] = sourceFile.readBitString(lenLevel); 

} 

// getHuffmanCodes 

Bitstring [] freqOrderedHuffmanCodes = 

VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 

// fill decodingMap with (Huffman code jWord) mapping 

int bits2Read = isIWordMap ? iBits : jBits; 
int numLeafs = freqOrderedHuffmanCodes.length; 
for (int i = 0; i < numLeafs; i++) { 

decodingMap.put(freqOrderedHuffmanCodes [i], 
sourceFile.readBitString(bits2Read)); 

} 

// find length of shortest Huffman code 
for (int i = 0; i < numLevels; i++) { 

if (leafsAtLevelList [i] .bitPatternO != 0) { 

lenOfShortCodelnTable.value = i; 
return; 

} 

} 

} 

} 


package thesis.compression.multi_tree3; 


import j ava.uti1.*; 

import java.io.*; 

import org.omg.CORBA.IntHolder; 

import thesis.compression.multi_tree.*; 

import thesis.compression.multi_tree2.*; 

/** 

* An instance of this class decompresses a file compressed by 
MultitreeCompressionManagerS. 

*/ 

public class MultitreeDecompressionManagerS 
extends MultitreeDecompressionManager2 
{ 

public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeDecompressionManagerS " + 

”<sourceFilename> <targetFilename>”); 

try { 

if (args.length != 2) { 

throw new ArrayIndexOutOfBoundsException (); 

} 

BitStringReader sourceFile = new BitStringReader(args[0]) 
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BitStringWriter targetFile = new BitStringWriter(args[1]); 
new MultitreeDecompressionManagerS(sourceFile, targetFile); 
} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 


public MultitreeDecompressionManagerS(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

super(sourceFile, targetFile); 

} 


protected void buildDecodingMap(/* IN */ 
sourceFile, 

/* IN */ 
/* IN */ 
/* OUT */ 
/* OUT */ 

lenOfShortCodelr.Table, 

/* IN */ 

throws 

lOExceptior. 

{ 

int nurLN'or.Enpnyhevels = 

(sourceFile.reaiBitString(5)).bitPattern(); 
if (nu-'-^cr.Er^ptyLevels ==0) { 

r e t u r r., 

} 


BitStringReader 

int iBits, 
int jBits, 

HashMap decodingMap, 
IntHolder 

boolean isIWordMap) 


// build leafsAtLevelList for getHuffmanCodes method 
int f 1 rrt Nr^nErptyLevel = 

(sourceFile.readB;t St ring(5)) .bitPattern(); 

int lenSevel • (sourceFile.readBitString(5)).bitPattern(); 

int nur.Levels « numNonEmptyLevels + firstNonEmptyLevel ; 
Bitstring[] leafsAtLevelList = new Bitstring[numLevels]; 
for (int 1 « 0; i < numLevels; i++) { 

if (i < firstNonEmptyLevel) { 

leafSAtLevelList[i] = new Bitstring((short)lenLevel, 

0 ) ; 

} else { 

leafsAtLevelList[i] = 
sourceFile.readBitString(lenLevel); 

} 

} 


// getHuffmanCodes 

Bitstring[] freqOrderedHuffmanCodes = 
VonNoymanNode.getHuffmanCodes(leafsAtLevelList); 


// fill decodingMap with (Huffman code jWord) mapping 
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int bits2Read = isIWordMap ? iBits : jBits; 
int numLeafs = freqOrderedHuffmanCodes.length; 
for (int i = 0; i < numLeafs; i++) { 

decodingMap.put(freqOrderedHuffmanCodes[i], 
sourceFile.readBitString(bits2Read)); 

} 

// find length of shortest Huffman code 
for (int i = 0/ i < numLevels; i++) { 

if (leafsAtLevelList[i].bitPattern0 != 0) { 

lenOfShortCodelnTable.value = i; 
return; 

} 

} 

} 


} 


package thesis.compression.multi_tree4; 


import java.util.*; 

import j ava.io.*; 

import org.omg.CORBA.IntHolder; 

import thesis.compression.multi_tree.*; 

import thesis.compression.multi_tree3.*; 

j-k-k 

* An instance of this class decompresses a file compressed by 
MultitreeCompressionManager4. 

*/ 

public class MultitreeDecompressionManager4 
extends MultitreeDecompressionManagerS 
{ 

public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeDecompressionManager4 " + 

”<sourceFilename> <targetFilename>"); 


try { 

if (args.length != 2) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

BitStringReader sourceFile = new BitStringReader(args [0]); 
BitStringWriter targetFile == new BitStringWriter (args [1] ) ; 
new MultitreeDecompressionManager4(sourceFile, targetFile) 
} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.printIn(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 
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} 


} 

public MultitreeDecompressionManager4(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

super(sourceFile, targetFile); 

} 


protected void 

decompressSourceFile(/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int iBits, 


/* 

IN 

*/ 

int jBits, 


/* 

IN 

*/ 

int iTree, 


/* 

OUT 

*/ 

HashMap [] 

decodingMaps, 

/* 

IN 

*/ 

Bitstring remainder. 


/* 

IN 

*/ 

Bitstring offset) 


throws 

lOException 

{ 

targetFile.writeBitString(offset); 

// build decoding maps (Huffman code ==> jWord) 

// rerr.er^er length of shortest Huffman code in each map 
IntHolder;] lenOfShortHufCode = new IntHolder[iTree + 1] ; 
for (int 1 » 0; i < iTree; i++) { 

lenC f ShortHuf Code [i] = new IntHolderO; 

buiIdSecodingMap(sourceFile, iBits, jBits, decodingMaps[i], 
lenOfShortHufCode[i], i == iTree); 

} 

long njrwcrdsToRead = 

(sourceFile.readF;tString(32)).bitPattern(); 

sourceFile.setDecodeLimit(numWordsToRead); 

Bitstring iWord = null, jWord = null; 


// get first unencoded iWord in sourceFile 
iWord = sourceFile.readBitString(iBits); 

// read an unencoded iWord and an encoded jWord from sourceFile 
// write iWord and jWord to targetFile 
while (numWordsToRead >0) { 

jWord = 

sourceFile.decodeBitString(decodingMaps[iWord.bitPattern() ] , 
lenOfShortHufCode[iWord.bitPattern0].value); 

targetFile.writeBitString(iWord); 
targetFile.writeBitString(jWord); 
numWordsToRead--; 

iWord = sourceFile.readBitString(iBits); 


} 
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sourceFile.close 0; 

targetFile.writeBitString(remainder); 
targetFile.close 0; 

} 

} 


package thesis.compression.multi_tree5; 


import java.util.*; 

import j ava.io.*; 

import org.omg.CORBA.IntHolder; 

import thesis.compression.multi_tree.*; 

y * * 

* An instance of this class decompresses a file compressed by 
MultitreeCompressionManagerS. 

*/ 

public class MultitreeDecompressionManagerS 
extends 

Multitre eDe compre s sionManage r 

{ 

private BitStringWriter targetFile; 
private BitStringReader sourceFile; 
private HashMap[] decodingMaps; 
private int n; 

private Bitstring remainder; 
private Bitstring offset; 


public static void main(String[] args) { 

String usage = new String("USAGE: java 
MulitreeDecompressionManagerS " + 

"<sourceFilename> <targetFilename>"); 


try { 

if (args.length != 2) { 

throw new ArrayIndexOutOfBoundsException(); 

} 

BitStringReader sourceFile = new BitStringReader(args[0]); 
BitStringWriter targetFile = new BitStringWriter(args[1]); 
new MultitreeDecompressionManagerS(sourceFile, targetFile) 
} catch (ArrayIndexOutOfBoundsException wrongNumArgs) { 

System.out.println(usage); 

} catch (Exception e) { 

e.printStackTrace(new PrintStream(System.out)); 

} finally { 

System.exit(1); 

} 

} 
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public MultitreeDecompressionManagerS(BitStringReader sourceFile, 

BitStringWriter targetFile) 

throws 

lOException 

{ 

super(sourceFile, targetFile); 

} 


protected void 

decompressSourceFile{/* 

OUT 

*/ 

BitStringWriter 

targetFile, 

/* 

IN 

*/ 

BitStringReader 

sourceFile, 

/* 

IN 

*/ 

int n, 


/* 

OUT 

*/ 

HashMap [] 

decodingMaps, 

/* 

IN 

*/ 

Bitstring remainder. 


/* 

IN 

*/ 

Bitstring offset) 


throws 

lOException 

{ 

targetFile.writeBitString(offset); 

// build decoding maps (Huffman code ==> class index) 

// remember length of shortest Huffman code in each map 
IntHolder[] lenOfShortHufCode = new IntHolder[n + 1]; 
for (int i = 0; i <= n; i++) { 

lenOfShortHufCode [i] = new IntHolderO; 
buildDecodingMap(sourceFile, n, decodingMaps[i], 
lenOfShortHufCode[i], i == n); 

} 

// build decoding table (necklace class index ==> necklace 

class) 

long numClasses = Necklace.numClasses(n); 

Assertion. assert-{numClasses <= Integer.MAX_VALUE) ; 
int[] indexToClass = new int[(int)numClasses]; 

Necklace.loadTable_indexToClass(n, indexToClass); 

long numBitStringsCompressed = 

(sourceFile.readBitString(32)).bitPattern(); 

sourceFile.setDecodeLimit(2L * numBitStringsCompressed); 
sourceFile.setN(n); 

Bitstring r = null, c = null; 
int rot = -1; 

// get first encoded rotation in sourceFile 
r = sourceFile.decodeBitString(decodingMaps[n], 
lenOfShortHufCode[n].value); 

// read an encoded rotation and class index from sourceFile 
// write an unencoded bitstring to targetFile 
while (r.length 0 >0) { 

rot = r.bitPattern0; 

// c is class index 


120 


c = sourceFile.decodeBitString(decodingMaps[0], 
lenOfShortHufCode[0].value); 

// c is class 

c.relnit((short)n, indexToClass[c.bitPattern()]); 

// c is original unencoded bitstring 11! 
c.rShift(rot, true); 

// write c to the target file 
targetFile.writeBitString(c); 

// get next encoded rotation in sourceFile 
// a Bitstring of length 0 indicates end of sourceFile 
r = sourceFile.decodeBitString(decodingMaps[n], 
lenOfShortHufCode[n].value); 

} 

sourceFile.close 0; 

targetFile.writeBitString(remainder); 
targetFile.close(); 

} 

} 


package thesis.compression.multi_tree; 


import java.util.*; 
import j ava.lang.*; 
import java.io.*; 


/** 

* This class contains all the supporting methods needed to implement 
the RTA approach 

* presented in the thesis. See chapter 4 for an explanation of the r 
and i values 

* referred to thruout the code. 

*/ 

public class Necklace { 

private final int [] bitStringToIndex; 
private final byte[] bitStringToRots; 
private final int[] indexToClass; 
private final int n; 
private final long numClasses; 
private final int indexLen; 
private final int rotLen; 

/** Returns the length in bits of the i associated with a symbol of 
length n >= 1. */ 

public static final int indexLength(int n) ( 
return n - rotLength(n) + 1; 

} 
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/** Returns the length in bits of the r associated with a symbol of 
length n >= 1. */ 

public static final int rotLength(int n) { 

return (int)Math.ceil(Math.log(n) / Math.log(2)); 

} 

/** Helper function for numClasses(n) */ 
public static int gcdEuclid(int a, int b) { 
if (b==0) { 
return a; 

} else { 

return gcdEuclid(b, a % b); 

} 

} 

/** Helper function for numClasses(n) */ 

public static final boolean areRelativelyPrime(int a, int b) { 
return gcdEuclid(a, b) == 1 ? true : false; 

} 


/** Helper function for numClasses(n) */ 

public static final int theta(int d) { 
int count = 0; 

for (int i = 1; i <= d; i++) { 

if (areRelativelyPrime(d, i)) { 

count++; 

} 

} 

return count; 

} 

/** 

* Uses the Burnside formula to calculate the exact number of 
classes for a given 

* value of n. 

*/ 

public static final long numClasses(int n) { 
long sum = 0; 

for (int d = 1; d <= n; d++) { 

if (n % d == 0) { 

sum += theta(d) * Math.pow(2, n / d); 

} 

} 

return sum / n; 

} 

y * * 

* The necklace algorithm. This method builds a table which 
allows the efficient 

* lookup of a necklace class based upon its index in a list of 
all such classes. 

* 

* PRECONDITION: indexToClass.length == numClasses(n) 

* POSTCONDITION: indexToClass contains all the necklace classes 
of length n 
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* as given by the necklace algorithm described in 
chapter 3 of the 

* thesis. 

*/ 

public static void loadTable_indexToClass{ /* IN */ int n, 

/* OUT */ int[] 


indexToClass) 

{ 

int k = 0; 

// 

index to 

store next class at 

Bitstring c = new Bitstring((short)n, -1); 

// 

the first 

necklace class 

indexToClass[k++] = c.bitPattern (); 

// 

store it at 

index 0 of classes 

while (c.bitPattern0 != 0) { 

int j = c.leastSigl0; 
c.clearBit(j); 

// 

replace least 

sig 1 with 0 

int jj = j - 1; 

int i = c.length 0 - 1; 

// 

most 

significant bit 

while (jj >= 0) { 

// 

copy over, 

copy over, ... 

c.assignBit(jj, c.bitAt(i)); 

j j““; 
i’-; 

} 



if ((n % (n - j)) == 0) { 

// 

j check 


indexToClass[k++] = c.bitPattern(); 

} 

} 

} 

/** 

* This method builds two tables of size 2'^n. The 
bitStringToIndex table allows 

* the efficient lookup of the class mapped to by any bit string 
of length n. The 

* bitStringToRots table allows the efficient lookup of the number 
of rotations 

* required to map any bit string of length n to its necklace 
class. 

*/ 

public static void 

loadTables_bitStringToIndex_bitStringToRots(/* IN */ int n, 

/* IN */ int[] 

indexToClass, 

/* IN */ long 

numClasses, 

/★ OUT */ int[] 

bitStringToIndex, 

/* OUT */ byte[] 

bitStringToRots) 

{ 

for (int i s= 0; i < numClasses; i++) { 

Bitstring bs = new Bitstring(n, indexToClass[i]); 
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int j = 0; 
do { 

bitStringToIndex[bs.bitPattern()] = i; 
bitStringToRots[bs.bitPattern()] = (byte)j; 

j++; 

bs.rShift(true); 

} while (bs.bitPattern0 != indexToClass [i])/ 

} 

} 

/** 

* Constructs a Necklace object of length n. This object provides 
access to all 

* the tables and methods needed to efficiently work with binary 
necklaces of length n. 

*/ 

public Necklace(int n) { 
this.n = n; 

numClasses = numClasses(n); 

Assertion.assert(numClasses <= Integer.MAX^VALUE); 
indexToClass = new int[(int)numClasses]; 
loadTable_indexToClass(n, indexToClass); 

bitStringToIndex = new intt(int)(Math.pow(2, n))]; 
bitStringToRots = new byte[bitStringToIndex.length]; 
loadTables_bitStringToIndex_bitStringToRots( 

n, indexToClass, numClasses, bitStringToIndex, 
bitStringToRots); 

indexLen = indexLength (n) ,* 
rotLen - rotLength(n); 

} 

/** Returns the index of the necklace class mapped to by bitstring 

*/ 

public final int bitStringToIndex(int bitstring) { 
return bitStringToIndex[bitstring]; 

} 

/** Returns the # of rotations required to map bitstring to its 
necklace class */ 

public final byte bitStringToRots(int bitstring) { 
return bitStringToRots[bitstring]; 

} 

/** Returns the necklace class at the given index. */ 
public final int indexToClass(int index) { 
return indexToClass[index]; 

} 

/** Returns the value of n being used by this Necklace object */ 
public final int n() { 
return n; 

} 


124 


/** Returns the number of classes associated with this Necklace 
object */ 

public final long numClassesO { 
return numClasses; 

} 

/** Returns the length in bits of the r's associated with this 
Necklace object */ 

public final int rotLengthO { 
return rotLen; 

} 

/** Returns the length in bits of the i's associated with this 
Necklace object */ 

public final int indexLength() { 

return indexLen; 

} 

} 


package thesis.compression.multi_tree; 


import java.util.*; 
import j ava.io.*; 

import thesis.compression.huffman.*; 


/** 

* Objects of this class are used to collect compression statistics 
for any method that 

* implements the StatsCollected Interface. Each Stats object holds 
statistics for 

* one implementor of StatsCollected, 

*/ 

public class Stats implements Comparable { 


private String filename; 
extension 

private String ext; 
private int n; 
compression 

private int iBits; 
of Huffman tree 

private int jBits; 
node of Huffman tree 
private int iTree; 
tree) 

private int lenOffset; 
bits in sourceFile 


// complete filename including 

// filename extension 
// word length used for 

// num bits to specify index 

// num bits to specify leaf 

// highest tree index (iWord 

// # of uncompressed leading 


private long sizeBefore; 
private long sizeAfter; 


// in bits 
// in bits 
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private double[] stringDistr; 
each Huffman tree 

private int [] classDistr; 
Huffman tree 

private long sizeAfterHuf; 
straight Huffman 

private long headerSize; 


// % of source file mapped to 
// # classes mapped to each 

// file size achieved using 
// header size 


public Stats(StatsCollected mcm) 

throws 

lOException, 

StatsNotAvailableException 

{ 

n = mcm.n(); 

iBits = mcm.iBitsO; 

jBits = mcm.jBitsO; 

iTree = mcm.iTreeO; 

lenOffset = mcm.lenOffset() ; 

filename = mcm.sourceFilename(); 

int dotPos = filename.indexOf 

ext = (dotPos 1= -1 ? filename.substring(dotPos) : 

mcm.buildFregTables(); 

sizeBefore = mcm.sourceFileSize(); 

long numBitStrings = sizeBefore / n; // int div 

stringDistr = new double[iTree] ; 

Arrays.fill(stringDistr, 0) ; 

Iterator it = mcm.lastTreelnspector() ; 
while (it.hasNext{)) { 

VonNoymanNode node = (VonNoymanNode)it.next(); 
stringDistr[node.index.bitPattern()] = 

(double)node.freq / (double)numBitStrings * lOOd; 

} 

classDistr = mcm.treeSizesO; 

mcm.compressSourceFile0; 

sizeAfter = mcm.targetFileSize(); 
headerSize = mcm.headerSize() ; 
mcm = null; 

} 

public Stats(StatsCollected mcm, 

HuffmanCompressionManager hem) 

throws 

lOException, 

StatsNotAvailableException 

{ 

this (mcm) ; 
hem.compress(); 
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sizeAfterHuf = hem.sizeAfter(); 
hem = null; 

} 

publie final int eompareTo(Stats s) { 
if (ext.equals(s.ext)) { 

if (filename.equals(s.filename)) { 
if (n == s.n) { 

if (iBits == s.iBits) { 

return lenOffset - s.lenOffset; 
} else { 

return iBits •- s. iBits; 

} 

} else { 

return n s.n; 

} 

} else { 

return filename.eompareTo(s.filename); 

} 

} else { 

return ext.eompareTo(s.ext); 

} 

} 

publie final inr eompareTo(Objeet obj) { 
return eompareTo((Stats)obj); 

} 

public String toStringO { 

StringBjffer sb = new StringBufferO; 

String nl new String("\r\n”); 

sb.appeni'-FILENAME : "); 

sb.append;f:lename); 
sb.append in 15 ; 

sb.append(-n : ”) ; 

sb. append n; ; 
sb.append(nl;; 

sb.append(-iBits : ”); 

sb.append (iBits); 
sb.append(nl); 

sb.append (*'jBits : ”) ; 

sb.append (jBits); 
sb.append(nl); 

sb.append("offset : "); 

sb.append(lenOffset); 
sb.append(nl); 
sb.append(nl) ; 

sb.append("file size: "); 
sb.append(toKB(sizeBefore)); 
sb.append(nl); 
sb.append(nl); 
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sb.append("header size: "); 
sb.append(headerSize / 8); 
sb.append(" bytes"); 
sb.append(nl)/ 
sb.append(nl); 

if (sizeAfterHuf > 0) { 

sb.append("HUF size : "); 
sb.append(toKB(sizeAfterHuf)); 
sb.append(nl); 

sb.append("change : "); 

sb.append(change(sizeBefore, sizeAfterHuf)); 
sb.append(nl); 
sb.append(nl); 

} 

sb.appendC'MTC size : ") ; 
sb.append(toKB(sizeAfter)); 
sb.append(nl); 

sb.append("change : "); 

sb.append(change(sizeBefore, sizeAfter)); 
sb.append(nl); 
sb.append(nl); 

sb.append("tree strings leafs"); 
sb.append(nl); 
double tmp; 

for (int i = 0; -i < iTree/ i++) { 

sb.append(i); 

sb.append((i < 10 ? " " : " 

tmp = round(stringDistr[i], 0.1); 
sb.append(tmp); 
sb.append("%") ; 

sb.append((tmp < 10 ? " " : 

sb.append(classDistr[i] ) ; 
sb.append(nl); 

} 


")); 

")) ; 


return sb.toString()/ 

} 

private final String change(long orig, long cur) { 
long diff = cur - orig; 

return String.valueOf(round{(double)diff / (double)orig * lOOd, 
O.ld)) + "%"; 

} 

private final String toKB (long bits) { 

return String.valueOf(round((double)bits / 8d / 1024d, O.OOld)) 

+ " KB"; 

} 


private final double round(double number, 

double placeValue ) { 
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double recipPlaceValue = 1.0/placeValue; 

return ( (int)(nuraber*recipPlaceValue+0.5d) / recipPlaceValue 

); 

} 

} 

package thesis.compression.multi_tree; 

import java.util.*; 
import java.io.*; 

/** 

* This interface declares methods needed to collect statistics on the 
various 

* compression classes like HuffmanCompressionManager and 
MultitreeCompressionManager). 

* Classes which implement this interface can be analyzed by the 
StatsManager. 

*/ 

public interface StatsCollected 

{ 

public int iBitsO; 

public int jBitsO; 

public int iTreeO; 

public int n(); 

public int lenOffsetO; 

public String sourceFilename(); 

public long sourceFileSize() throws StatsNotAvailableException; 
public long targetFileSize() throws StatsNotAvailableException; 
public Iterator lastTreelnspector() throws 
StatsNotAvailableException; 

public long headerSize() throws StatsNotAvailableException; 
public int [] treeSizesO throws StatsNotAvailableException; 

public void buildFreqTables() throws lOException; 
public void compressSourceFile() throws lOException; 

} 


package the sis.compre s sion.multi_t ree; 

import java.util.*; 
import java.io.*; 
import j ava.text.*; 

import thesis.compression.huffman.*; 
import thesis.compression.huffman2.*; 
import thesis.compression.multi_tree2.* 
import thesis.compression.multi_tree3.* 
import thesis.compression.multi_tree4.* 
import thesis.compression.multi_tree5.* 
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/ * * 

* This static class drives a simple text based menu which allows 
testing of all 

* implemented compression algorithms. The main method of this class 
is the standard 

* entry point into all the compression algorithms created for this 
thesis. 

*/ 

public class StatsManager { 

public static final String szOutDir = "c:\\My 
Documents\\compressedFiles\\"; 

public static final String szInDir = "c:\\My 
DocumentsWfilesToCompressW " ; 

private static TreeSet registeredStats = new TreeSetO; 


public static void main(String[] args) throws Exception { 


System.out.printIn("XnMultitree Compression Tool, May 6 , 
200l\n"); 

System.out.printIn("USAGE"); 

System.out.printIn { "1) Create the directory + szInDir + 

System.out.println(”2) Put all files to compress in specified 
directory."); 

System.out.print In ( "3) Follow on screen prompts."); 

System.out.print In("4) Compressed files & date stamped stats 
file at " + szOutr^r ; 

BufferedPeader inUser = new BufferedReader(new 
InputStreamReader'System.in)); 

System.out.printIn("\nselect an MTC method"); 

System.out.println(”1) rotational trees"); 

System.out.println("2) rotational trees (tree 0 & N only)"); 
System.out.printIn ("3) indexed trees"); 

System,. out . pr int In ("4) indexed trees + tight header scheme"); 
System.out.println("5) indexed trees + tight header scheme - 
Huffman tree for iWords*’); 

System.out.print("Selection: "); 

int method » Integer.parseInt(inUser.readLine()); 

System.out.println(); 

File outDir = new File(szOutDir); 

if (loutDir.exists 0 && IoutDir.mkdir()) { 

throw new lOException("Can\t create " + szOutDir + 

"!\nCreate the directory " + 

"manually or change file permissions then restart this 

tool.\n"); 

} 

File inDir = new File(szInDir); 
if (!inDir.exists 0) { 

throw new FileNotFoundException( 

"Input directory " + szInDir + " does not exist!\n"); 

} 

string[] sourceFiles = inDir.list(); 
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if (sourceFiles == null || sourceFiles.length == 0) { 

throw new FileNotFoundException( 

"No source files found in '" + szInDir + "'!!!"); 

} 

String[] targetFiles = new String[sourceFiles.length] ; 
for (int i = 0; i < targetFiles.length; i+*f) { 

int dotPos = sourceFiles[i].indexOf 
if (dotPos >0) { 

targetFiles[i] = sourceFiles [i] .substring(0, dotPos) 
} else { 

targetFiles[i] = sourceFiles [i]/ 

} 

} 


switch (method) { 

case 1: 

case 2: rotationalTrees(inUser, outDir, inDir, 

sourceFiles, targetFiles, method); 

break; 

case 3: 

case 4: 

case 5: indexedTrees(inUser, outDir, inDir, 

sourceFiles, targetFiles, method); 

break; 

default: throw new Exception("\nlnvalid selection - 

restart the tool and try again."); 

} 

} 

private static void rotationalTrees(BufferedReader inUser, 

File outDir, 

File inDir, 

String[] sourceFiles, 
String[] targetFiles, 
int mtcMethod) 

throws 

Exception 

{ 


System.out.print("Enter a lower bound for N: "); 
int lowN = Integer.parseInt(inUser.readLine{)); 

System.out.print{"Enter an upper bound for N: "); 
int highN = Integer.parseint(inUser.readLine()); 

boolean offsetsOn = false; 

System.out.print ("Turn offsets ON (y/n) ? "),*' 
if (inUser.readLine0.equalsIgnoreCase("y")) { 

offsetsOn = true; 

} 

boolean hufCompOn = false; 

System.out.print("Turn Huffman comparisons ON (y/n)? "); 
if (inUser.readLine0.equalsIgnoreCase("y")) { 
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hufCompOn = true; 


} 

System.out.print{"\nTable building memory requirements for this 

run: "); 

System.out.print((long)(5 * Math.pow(2, highN) + 4 * 

Necklace.numClasses(highN))); 

System.out.printIn(” bytes"); 

System.out.printIn("\nPROGRESS:”); 

for (int i = 0; i < sourceFiles.length * (highN - lowM + 1) ; 

i++) { 

System.out.print ("."); 

} 

System.out.println(); 

for (int i = 0/ i < sourceFiles.length; i++) { 
for (int n = lowN; n <= highN; n++) { 

for (int offsetLen = 0; offsetLen == 0 || (offsetsOn && 
offsetLen < n); offsetLen++) { 

BitStringReader bsr = new BitStringReader(szInDir + 

sourceFiles [i] , n); 

BitStringWriter bsw = new BitStringWriter( 

szOutDir + targetFiles[i] + n + + offsetLen 

+ ”.MTC" + 

(mtcMethod == 1 ? "1" : ”5")); 

StatsCollected mcm; 
if (mtcMethod == 1) { 

mcm = new MultitreeCompressionManager(bsr, bsw, 

n, offsetLen); 

} else { 

mcm = new MultitreeCompressionManagerS(bsr, 

bsw, n, offsetLen); 

} 

if (hufCompOn) { 

BitStringReader bsr2 = new 
BitStringReader(szInDir + sourceFiles [i], n); 

BitStringWriter bsw2 = new BitStringWriter( 
szOutDir + targetFiles[i] + n + + 

OffsetLen + ".HUF"); 

HuffmanCompressionManager hem = 

new HuffmanCompressionManager(bsr2, bsw2, 

n, offsetLen); 

registerStats(new Stats(mcm, hem)); 

} else { 

registerStats(new Stats(mcm)); 

} 

} 

System.out.print("."); 

} 

} 

System.out.println("\n”); 

outputStats(mtcMethod + " (rotational trees)”); 


} 
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private static void indexedTrees(BufferedReader inUser, 

File outDir, 

File inDir, 

String [] sourceFiles, 

String [] targetFiles, 
int mtcMethod) 

throws 

Exception 

{ 

String userPair = null; 

int indexOfSpace = -1, userl = -1, userJ = -1, lowN = 
BitString.MAXLEN; 

ArrayList arrayl = new ArrayListO, arrayJ = new ArrayListO; 

System.out.print("Enter a space delimited iBits jBits pair 
(like 4 12 or 8 8): "); 

userPair = inUser.readLine(); 

while (userPair != null && userPair.length() >= 3) { 
indexOfSpace = userPair.indexOf(’ ’); 

userl = Integer.parseint(userPair.substring(0, 
indexOfSpace)); 

userJ = Integer.parseint(userPair.substring(indexOfSpace + 

D); 

arrayl. add (new Integer (userl) •) ; 
arrayJ.add(new Integer(userJ)); 
if (lowN > userl + userJ) { 
lowN = userl + userJ; 

} 

System.out.print("Enter another pair or <RETURN> to begin 
compression: "); 

userPair = inUser.readLine(); 

} 

System.out.print("Enter an offset bound (0 to ” + (lowN - 1) + 

") : ”) ; 

int offsetBound = Integer.parseint(inUser.readLine()); 
if (offsetBound < 0) { 

offsetBound = 0; 

} else if (offsetBound > lowN - 1) { 

offsetBound = lowN - 1; 

} 

boolean hufCompOn = false; 

System.out.print("Turn Huffman comparisons ON (y/n)? "); 
if (inUser.readLine0.egualsIgnoreCase{"y”)) { 
hufCompOn = true; 

} 

System.out.println("\nPROGRESS:"); 

for (int a = 0; a < sourceFiles.length * arrayl.size() * 
(offsetBound + 1); a++) { 

Sys tern.out.print(".”); 

} 

System.out.printlnO ; 
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for (int a = 0; a < sourceFiles.length; a++) { 

for (int b = 0; b < arrayl.size(); b++) { 

for (int c = 0; c <= offsetBound; C++) { 

userl = ((Integer)arrayl.get(b)).intValue(); 
userJ = ((Integer)arrayJ.get(b)).intValue0; 
BitStringReader bsr = new BitStringReader(szInDir + 

sourceFiles [a]); 

BitStringWriter bsw = new BitStringWriter( 

szOutDir + targetFiles [a] + userl + + userJ 

+ M_M + c + ".MTC" + 

(mtcMethod - 1)); 

StatsCollected mcm; 
if (mtcMethod == 3) { 

mcm = new MultitreeCompressionManager2(bsr, 

bsw, userl, userJ, c); 

} else if (mtcMethod == 4) { 

mcm = new MultitreeCompressionManagerS(bsr, 

bsw, userl, userJ, c); 

} else { 

mcm = new MultitreeCompressionManager4(bsr, 

bsw, userl, userJ, c); 

} 

if (hufCompOn) { 

int n = userl + userJ; 

BitStringReader bsr2 = new 
BitStringReader(szInDir + sourceFiles[a], n); 

BitStringWriter bsw2 = new BitStringWriter( 

szOutDir + targetFiles [a] + n + + c + 

".HUF"); 

HuffmanCompressionManager2 hcm2 = 

new HuffmanCompressionManager2(bsr2, bsw2, 

n, c) ; 

registerStats(new Stats(mcm, hcm2)); 

} else { 

registerStats(new Stats(mcm)); 

} 

System.out.print("."); 

} 

} 

} 

System.out.printIn(”\n"); 

outputStats(mtcMethod + " (indexed trees)"); 

} 

public static final void registerStats(Stats stats) { 
registeredStats.add(stats); 

} 

public static void outputStats(String method) 
throws 

Fi1eNot FoundException 

{ 
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SimpleDateFormat formatter = new 
SimpleDateFormat(”D_hh_mm_ss”); 

Date date = new Date(); 

String szDate = formatter.format(date); 

PrintStream statsFile *= new PrintStream( 

new File0utputStream(s20utDir + "stats" + szDate + 

".txt")); 

PrintStream out = System.out; 

for (int i = 0/ i < 2/ i++) { 

System.setOut(i == 0 ? out : statsFile); 
System.out.println("Compression’method: " + method); 
System.out.println("Compressions performed: " + 
registeredStats.size() ); 

System, out.printIn 0; 

Iterator it = registeredStats.iterator(); 
while (it.hasNext0) { 

System.out.println(it.next()); 

System. out.println ("-") ; 

System.out.printIn 0; 

} 

} 

statsFile.close(); 

System.setOut(out); 

} 

} 


package thesis.compression.multi_tree; 


public class StatsNotAvailableException extends Exception { 
public StatsNotAvailableException() { 

super("StatsCollected method calls out of sequence"); 

} 

public StatsNotAvailableException(String str) { 
super(str); 

} 


} 


package thesis.compression.multi_tree; 

import java.util.*; 
import java.io.*; 

/** 

* Poorly named class which implements the Neuman (not John VonNoyman) 
technique of 
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* building a Huffman tree. The technique represents the leaves of a 
Huffman tree using 

* a decimal number. For example: 0.0235 indicates that the tree has 
0 leaves at level 

* 0, 0 leaves at level 1, 2 leaves at level 2 , 3 leaves at level 3, 
and 5 leaves at level 

* 4. Using this technique a Huffman tree is built from a frequency 
table by a 

* series of decimal shift and addition opperations which seems more 
convenient than the 

* actual construction of a binary tree. 

*/ 

public class VonNoymanNode implements Comparable { 

public long freq; 
public int[] leafsAtLevel; 
public Bitstring index; 

/* * 

* Returns the decimal digits that represent the leaves at each 
level of the 

* Huffman tree built from the frequency table represented by 
freqOrderedVNN. 

* The decimal digits are returned as an array of integers as each 
digit may 

* exceed 9 depending upon the size of the resultant Huffman tree. 

* 

* PRECONDITION: freqOrderedVNN is not empty 

* POSTCONDITION: freqOrderedVNN is trashed 
*/ 

public static int [] vonNoymanAlgorithmInt( /* IN/OUT */ TreeSet 
freqOrderedVNN) 
throws 

NoSuchElementException 

{ 

// special case 

if (freqOrderedVNN.size 0 == 1) { 

return new int[] {O, l}; 

} 

// PLC: vnnl.leafSAtLevel contains the # of leafs at each 
level of Huffman tree 

VonNoymanNode vnnl, vnn2; 
do { 

vnnl = (VonNoymanNode) freqOrderedVNN. firstO; 

freqOrderedVNN.remove(vnnl); 

if (freqOrderedVNN.isEmpty0) break; 

vnn2 = (VonNoymanNode) freqOrderedVNN. f irst () ; 

freqOrderedVNN.remove(vnn2); 

vnnl.add(vnn2); 

freqOrderedVNN.add(vnnl); 

} while (true); 

return vnnl.leafsAtLevel; 

} 

j itit 
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* Identical to vonNoymanAlgorithmInt (freqOrderedVNN) except that 
the resultant 

* list of decimal digits is returned as a Bitstring[] instead of 
an int [] . 

*/ 

public static final Bitstring[] vonNoymanAlgorithm( /* IN/OUT */ 
TreeSet freqOrderedVNN) 
throws 

NoSuchElementException 

{ 

int [] leafsAtLevel = vonNoymanAlgorithmInt (f reqOrderedVNN) ; 
Bitstring [] retValue = BitString.toBitStringArr(leafsAtLevel); 
Assertion.assert(leafsAtLevel.length == retValue.length); 
return retValue; 

} 

* Returns an array of BitStrings which represents the Huffman 
codes of a Huffman 

* tree built using leafsAtLevelList. The codes are ordered from 
shortest to 

* longest. 

*/ 

public static Bitstring[] getHuffmanCodes(Bitstring[] 
leafSAtLevelList) { 

// need one Huffman code per leaf so count leaves 
int numCodesNeeded = 0; 

for (int i = 0; i < leafsAtLevelList.length; i++) { 

numCodesNeeded += leafsAtLevelList[i].bitPattern(); 

} 

Bitstring [] retValue = new Bitstring[numCodesNeeded]; 
int pos = 0; 
int value = 0; 

for (int i = 0; i < leafsAtLevelList.length; i++) { 

// level 

for (int j = 0; j < leafsAtLevelList[i].bitPattern(); j++) 

{ // leaf 

retValue[numCodesNeeded - i - pos] = new 
Bitstring((short)(i), value); 

pos++; 
value++; 

} 

value <<= 1; 

} 

return retValue; 

} 

/** Constructor */ 

public VonNoymanNode (Bitstring index) { 
freq = 1; 

leafsAtLevel = new int[] {l}; 
this.index = index; 


} 
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/** 

* Helper method for VonNoymanAlgorithm methods. 

*/ 

public void add(VonNoymanNode rValue) { 

// add freq attributes 
freq 4-= rValue . f req; 

// add leafsAtLevel attributes 
int[] longer, shorter; 

if (leafsAtLevel.length > rValue.leafsAtLevel.length) { 
longer = leafsAtLevel; 
shorter = rValue.leafsAtLevel; 

} else { 

longer = rValue.leafsAtLevel; 
shorter = leafsAtLevel; 

} 


int[] temp = new int[longer.length + 1]; 
temp[0] = 0; 

for (int i = 0; i < longer.length; i++) { 

temp[i + 1] = longer [i] + (i < shorter.length ? shorter[i] 

0); 

} 

leafsAtLevel = temp; 

// ignore classindex attribute for this operation 

} 

public final void incrFreqO { 
freq++; 

} 

public final String toStringO { 

return index.toString{) + "\t(" + freq + 

} 

public final boolean equals(Object obj) { 

if ( (obj 1= null) ScSc (obj instanceof VonNoymanNode) ) { 

VonNoymanNode vnn = (VonNoymanNode) obj ; 
return freq == vnn.freq && index.equals(vnn.index); 

} 

return false; 

} 

public final int compareTo(VonNoymanNode vnn) { 
long diff = freq - vnn.freq; 
if (diff > 0) { 
return 1; 

} else if (diff < 0) { 

return -1; 

} else { 

return index.compareTo(vnn.index); 

} 

} 
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public final int compareTo(Object obj) { 
return compareTo((VonNoymanNode)obj); 

} 
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APPENDIX B: SUMMARY OF EMPIRICAL DATA 


To determine the performance of both RTA and IT A, we ran comparisons against 
standard Huffman encoding, using the Canterbury Corpus [19] and the well-known 
“Lenna” photograph [20]. The corpus is a suite of eleven text-based files, commonly 
used as an industry benchmark for testing lossless compression algorithms. This 
collection was developed in 1997 as an improved version of the older Calgary corpus. 
The files were chosen because their results on existing compression algorithms are 
"typical", and so it is hoped that they will be representative for new methods of 
compression as well. Lena is a Tagged Image File (TIF), which is an industry benchmark 
for image compression. The files that comprise The Canterbury Corpus are: 


file 

Abbrev 

Cateeorv 

Size in bvtes 

alice29.txt 

text 

English text 

152089 

asvoulik.txt 

play 

Shakespeare 

125179 

co.html 

html 

HTML source 

24603 

fields.c 

Csrc 

C source 

11150 

grammar. IsD 

•list 

LISP source 

3721 

kennedv.xls 

Excl 

Excel Spreadsheet 

1029744 

lcetl0.txt 

tech 

Technical writing 

426754 

Dlrabnl2.txt 

poem 

Poetry 

481861 

ptt5 

fax 

CCITT test set 

513216 

sum 

SPRC 

SPARC Executable 

38240 

xargs.l 

man 

GNU manual page 

4227 


Table 15: Canterbury Corpus Test Suite 
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The following graphs summarize the results of RTA, IT A, and Huffman encoding 
against the test suite files mentioned above. The title of the graph identifies the file and 
the tabular data represents the optimal compression for the given n. The IT A results of 
each graph represent the optimal combination of |iBits| + [jBitsI = n for each file. The 
RTA results are do not extend beyond n = 24 due to the exponential memory 
requirements of this approach. 



Chart 2: Compression Results for Test File Fields.c 
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Chart 13: Compression Results for Test File Iena.tif 
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APPENDIX C: MAPPING OF ASCII KEYBOARD CHARACTERS 

BYRTA 



ASCII 

Character 

Value 

space 

32 

! 

33 

•• 

34 

# 

35 

$ 

36 

% 

37 

& 

38 


39 

( 

40 

) 

41 

• 

42 


43 

, 

44 

- 

45 


46 

/ 

47 

0 

48 

1 

49 

2 

50 


51 

4 

52 

5 

53 

6 

54 

7 

55 

8 

56 

9 

57 


58 


59 

< 

60 

X 

61 

> 

62 

*> 

63 

e 

64 

A 

65 

B 

66 

C 

67 

D 

68 

E 

69 

F 

70 

G 

71 

H 

72 

1 

73 

J 

74 

K 

75 

L 

76 

M 

77 

N 

78 

O 

79 


Index 


Binary Tree 

00100000 2 

00100001 7 

00100010 6 

00100011 6 

00100100 2 

00100101 6 

00100110 6 

00100111 5 

00101000 2 

00101001 2 

00101010 2 

00101011 6 

00101100 4 

00101101 4 

00101110 4 

00101111 4 

00110000 2 

00110001 2 

00110010 2 

00110011 2 

00110100 , 2 
00110101 2 

00110110 2 

00110111 5 

00111000 2 

00111001 2 

00111010 2 

00111011 2 

00111100 2 

00111101 2 

00111110 2 

00111111 2 

01000000 1 

01000001 1 

01000010 1 

01000011 6 

01000100 1 

' 01000101 5 

01000110 5 

01000111 5 

01001000 1 

01001001 7 

01001010 4 

01001011 6 

01001100 4 

01001101 4 

01001110 4 

01001111 4 
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p 

80 

01010000 

1 

Q 

81 

01010001 

1 

R 

82 

01010010 

1 

S 

83 

01010011 

6 

T 

84 

01010100 

1 

U 

85 

01010101 

1 

V 

86 

01010110 

5 

w 

87 

01010111 

5 

X 

88 

01011000 

3 

Y 

89 

01011001 

3 

z 

90 

01011010 

3 

[ 

91 

01011011 

3 

\ 

92 

01011100 

3 

] 

93 

01011101 

3 

A 

94 

01011110 

3 


95 

01011111 

3 


96 

01100000 

1 

a 

97 

01100001 

1 

b 

98 

01100010 

1 

c 

99 

01100011 

6 

d 

100 

01100100 

1 

e 

101 

01100101 

1 

f 

102 

01100110 

1 

g 

103 

01100111 

5 

h 

104 

01101000 

1 

1 

105 

01101001 

1 

j 

106 

01101010 

1 

k 

107 

01101011 

6 

1 

108 

01101100 

1 

m 

109 

01101101 

1 

n 

110 

01101110 

4 

0 

111 

01101111 

4 

P 

112 

01110000 

1 

q 

113 

01110001 

1 

r 

114 

01110010 

1 

s 

115 

01110011 

1 

t 

116 

01110100 

1 

u 

117 

01110101 

1 

V 

118 

01110110 

1 

w 

119 

01110111 

1 

X 

120 

01111000 

1 

y 

121 

01111001 

1 

z 

122 

01111010 

1 

{ 

123 

01111011 

1 

1 

124 

01111100 

1 

} 

125 

01111101 

1 

- 

126 

01111110 

1 


Table 15: ASCII Mapping by RTA 
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APPENDIX D: SAMPLE OUTPUT FROM ITA COMPRESSION 

PROGRAM 


Compression method: 4 (indexed trees) 

Compressions performed: 1 

FILENAME: c:\My Documents\filesToConipress\sum 

n : 16 

iBits :4 

jBits : 12 

offset : 0 

file size: 37.344 KB 
header size: 3719 bytes 
HUE size : 24.199 KB 
change :-35.1% 

MTC size : 23.258 KB 
change : -37.6% 


tree 

strings 

leafs 

0 

39.0% 

892 

1 

2.5% 

61 

2 

6.7% 

222 

3 

16.0% 

221 

4 

4.1% 

175 

5 

2.7% 

110 

6 

13.0% 

282 

7 

8.5% 

166 

8 

2.9% 

33 

9 

1.5% 

47 

10 

0.5% 

46 

11 

0.3% 

17 

12 

0.3% 

13 

13 

1.2% 

30 

14 

0.4% 

18 

15 

0.5% 

56 
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APPENDIX E: RTA COMPRESSION FORMAT 


n 



5 

bits 

# of offset (uncompressed) bits f, 

where f < n 

5 

bits 

actual offset bits 



f 

bits 

# of remainder (uncompressed) bits 

r, 

where r < n 

5 

bits 

actual remainder bits 



r 

bits 

class tree 0 





bits used b to represent # leafs L 

at 

each level of Huffman tree 5 

bits 

# of levels V in Huffman tree 



5 

bits 

leafs in level v - 1 of Huffman tree 

(least freq) 

b 

bits 

leafs in level v - 2 of Huffman tree 


b 

bits 

leafs in level 0 of Huffman tree 


(most freq) 

b 

bits 

level V - 1 class (es) 


(longest code) 

ceil(log2n) 

bits 

level V - 2 class (es) 



ceil(log2n) 

bits 

level 0 class(es) 


(shortest code) 

ceil(log2n) 

bits 


class tree 1 


(same as above) 


class tree n - 1 


(same as above) 


rotation tree n 


bits used b to represent # leafs f at 
# of levels V in Huffman tree 
leafs in level v - 1 of Huffman tree 
leafs in level v - 2 of Huffman tree 

leafs in level 0 of Huffman tree 

level V “ 1 rotation(s) 
level V - 2 rotation(s) 

level 0 rotation(s) 


each level of Huffman tree 5 

5 

(least freq) b 

b 

bits 

bits 

bits 

bits 

(most freq) 

b 

bits 

(longest code) 

ceil(log2n) 
ceil(log2n) 

bits 

bits 

(shortest code) 

ceil(log2n) 

bits 
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# bitstrings processed 

32 bits 

encoded 

rotations 

variable 

encoded 

class index 

variable 

encoded 

rotations 

variable 

encoded 

class index 

variable 

last encoded rotations 

variable 

last encoded class index 

variable 
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APPENDIX F: ITA COMPRESSION FORMAT 


iBits 

jBits 


5 bits 
5 bits 


# of offset (uncompressed) bits f, where f < n 5 bits 

actual offset bits f bits 


# of remainder (uncompressed) bits r, where r < n 5 bits 


actual 

. remainder bits 





r 

bits 

index 

tree 0 







# of non-empty levels v in : 

Huffman 

tree 



5 

bits 

first 

non-empty level F 





5 

bits 

bits used b to represent # 

leafs L 

at each 

level 

of Huffman 

tree 5 

bits 

leafs 

in level F + v - 1 of 

Huffman 

tree 

(least 

freg) 

b 

bits 

leafs 

in level F + v - 2 of 

Huffman 

tree 



b 

bits 

leafs 

in level F of Huffman 

tree 


(most 

freq) 

b 

bits 

level 

F + V - 1 jWord 



(longest code) 

jBits 

bits 

level 

F + V - 2 jWord 





jBits 

bits 

level 

F jWord 



(shortest code) 

jBits 

bits 

index 

tree 1 







(same 

as above) 








index tree 2^iBits - 1 




(same as above) 




index tree 2'*'iBits (iTree) 

# of non-empty levels v in Huffman tree 


5 

bits 

first non-empty level F 


5 

bits 

bits used b to represent # leafs L at each 

level of Huffman 

tree 5 

bits 

leafs in level F + v - 1 of Huffman tree 

(least freq) 

b 

bits 

leafs in level F + v - 2 of Huffman tree 


b 

bits 

leafs in level F of Huffman tree 

(most freq) 

b 

bits 

level F + V - 1 iWord 
level F + V - 2 iWord 
iBits bits 

(longest code) 

iBits 

bits 

level F iWord 

(shortest code) 

iBits 

bits 


155 












# bitstrings processed 

32 bits 

encoded 

iWord 

variable 

encoded 

jWord 

variable 

encoded 

iWord 

variable 

encoded 

jWord 

variable 

last encoded iWord 

variable 

last encoded jWord 

variable 
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